Plugin Idea for a Natural Text-to-Speech w/ Azure


A plugin that reads the active note (or a selected portion of a note) using Natural voices from the Microsoft Azure Text-to-Speech service.


Microsoft Azure has a wonderful text-to-speech service that you can test on this page. Moreover, they offer a free account with which you can setup a speech service, and get an API key to use with your application/plugin. (there’s also a wonderful sample TTS studio)

You can use it completely for free with up to 0.5 Million characters per month, or 5 hours of audio. Beyond that, you fall into the ‘pay as you go’ mode.

The idea for the Obsidian plugin is:

  1. The user sets up their Azure account, and speech service in the tier they think best according to their intended use.
  2. They enter the key in the plugin settings.
  3. They configure the plugin settings with basic things like voice, pitch, speed.
  4. The plugin displays a button in the context menu, or the commands, to read the entire active note, or the selected portion.


The obvious advantage is that you could listen to your notes (with wonderful natural-sounding voices) as you navigate them, and perhaps also as you edit them. The cost of the use is entirely up to the user, since they setup their own Azure account and speech service.


The most obvious difficulty I can see for now is the basic user’s trouble in setting up their own Azure account. I found my way to setting up my service after a lot of googling.


I think something of the sort is possible because there are things like this out there:

I’m not a programmer but if I understood it correctly, what is parsed to the Speech Service is a text in SSML (Speech Synthesis Markup Language), so I think a conversion from another markup language like Markdown mustn’t be troublesome—just a thought, though.

What do you guys think? Would that be useful to you? Would anyone be willing to execute this idea?

This is just an idea, but I’m sure that the community approach to this could be rewarding.


Yes, this is so needed.

Imagine going for a walk having your notes read aloud to you.

I would pay for this.

1 Like