Voxtral Transcribe — dictate and type at the same time into your notes with voice commands (beta testers wanted!)

Hi!

I built Voxtral Transcribe, a speech-to-text plugin for Obsidian that lets you dictate directly into your markdown notes — including using your voice to create structure. Inserting headings, lists, to-dos, and more, all by voice. Transcription happens real-time and you can talk and type at the same time to keep creating content in one flow.

I’ve been using it daily for a while now and it’s been working really well for me. :slightly_smiling_face: The plugin features grew quickly beyond the initial simple idea so also spent time refactoring the code. It’s more stable and maintenable now so it felt like the right time to share it with others. The plugin is not yet available in the community plugins list (submission is pending review), but you can already install it manually or via BRAT. I do updates regularly.

What it does

Instead of typing, you just talk AND type if you want to — your words will all appear in your note. On desktop, text streams in real-time as you speak. On mobile, you tap a send button to transcribe chunks while the recording keeps going.

But what makes it more than “just” transcription is that you can control your document structure by voice:

  • Say “heading two” → inserts “## “

  • Say “bullet point” → inserts “- “

  • Say “new todo” → inserts “- [ ]”

  • Say “new paragraph” → double line break

  • Say “numbered item” → auto-incrementing numbered list

  • Say “delete last paragraph” or “undo” to fix mistakes

  • Say “stop recording” to end the session

  • And many more standard commands as well as the ability to add custom commands

After you stop, the plugin can automatically correct your text — fixing spelling, capitalization, and punctuation — without changing your writing style or markdown formatting. This uses the Mistral models for text.

Key features

  • Real-time streaming on desktop — text appears as you speak
  • Batch mode with tap-to-send on desktop + mobile — send audio chunks mid-dictation without stopping
  • Voice commands for headings (H1-H3), bullet points, to-dos, numbered lists, paragraphs, line breaks, delete, and undo
  • 13 languages — Dutch, English, French, German, Spanish, Portuguese, Italian, Russian, Chinese, Hindi, Arabic, Japanese, Korean. Voice commands automatically adapt to the selected language; English always works as fallback
  • Auto-correction — spelling, capitalization, and punctuation are fixed automatically after recording
  • Inline correction instructions — say “for the correction: change X to Y” and the corrector will follow your spoken instructions
  • Self-correction — say “no, not X but Y” and it handles it automatically
  • Microphone selection — choose which mic to use
  • Auto-pause on focus loss — configurable behavior when switching apps on mobile (pause immediately, pause after delay, or keep recording)
  • Voice command help panel — side panel showing all available commands for your active language, including the custom commands you added

How it works

The plugin uses Mistral’s Voxtral models for speech recognition. You’ll need a Mistral API key from console.mistral.ai

  • Desktop: real-time mode uses a WebSocket connection for live streaming; batch mode is also available
  • Mobile: batch mode with tap-to-send (real-time streaming requires Node.js which isn’t available on mobile)

Installation

Via BRAT (recommended for beta testing)

  1. Install BRAT from Community Plugins if you haven’t already
  2. Open BRAT settings → Add Beta Plugin
  3. Enter: `maxonamission/obsidian-voxtral`
  4. Enable the plugin in SettingsCommunity Plugins
  5. Go to SettingsVoxtral Transcribe and enter your Mistral API key

Manual installation

  1. Download `main.js`, `manifest.json`, and `styles.css` from the latest release the github repo is maxonamission/obsidian-voxtral
  2. Create a folder `.obsidian/plugins/voxtral-transcribe/` in your vault
  3. Copy the three files into that folder
  4. Restart Obsidian, enable the plugin, and enter your API key

Feedback welcome!

I’d love to hear how it works for you — especially:

  • How well does it work in your language?
  • Are the voice commands intuitive?
  • How’s the experience on mobile?
  • Any bugs or rough edges you run into?
  • There is a dual delay option that looks cool - you see the text appear really quickly on desktop and the second stream corrects mistakes quite well. I however have doubts about it’s actual usefulness. It’s quite complex to get the voice commands to work well. I might abandon this specific feature.

Feel free to open issues on GitHub or reply here!


GitHub: maxonamission/obsidian-voxtral

License: GPL-3.0, free and open source

The link to the github repo for this plugin: maxonamission/obsidian-voxtral

Installed via BRAT. Works perfectly! Prior to this I was using Handy, an open source speech to text, which uses Whisper. However, yours is vastly superior with the ability to insert headings, line and paragraph breaks. Experimented with a 200 word text in both English and French, with two paragraph breaks. Voice commands work seamlessly. Perfect layout, no spelling errors except for two very minor ‘errors’ which occurred when the CPU spiked. Really well done!

1 Like

Cool! Thanks for taking the time to test and secretly proud you had such a good experience :slight_smile: let me know about any quirks or wishes. It’s relatively stable now and I will continue to fine tune. Using it a lot myself at the moment and exploring a way to test and improve performance in each of the languages using a set if audio files with voice commands with different voices and background noises. Voxtral speaks a lot more languages than I do :slight_smile:

What is harder for me is to know what commands a native speaker would expect to use.