VaultSearch // local-first hybrid search (BM25 + semantic + fuzzy)

Disclaimer

Is this project open source? Yes
Is this project completely free? Yes
Is this project vibe-coded beyond the author’s ability to comprehend how it works? No


Hello everyone!

I’ve been working on a new plugin for a while and I’m finally ready to show it off: VaultSearch. It’s an on-device search engine that combines keyword, semantic, and fuzzy title search for your Obsidian vault.

Why another search plugin?

While Obsidian’s built-in search is fast and great for keyword matching, it often fails when you remember the concept but not the exact words. For example, if you search for “how to stay focused,” it might not find a note titled “Deep Work Routine” — even though that’s exactly what
you were looking for. VaultSearch tries to fix that by looking at your vault from three angles at once.

How it works

Three retrievers run in parallel, merged with Reciprocal Rank Fusion:

  • BM25 keyword search through SQLite FTS5 — fast and exact.
  • Semantic vector search using a quantized multilingual MiniLM model (384 dims, 50+ languages). This is what catches “stay focused” → “Deep Work Routine.”
  • Fuzzy title matching with Jaro–Winkler, so a typo or half-remembered title still lands you on the right note.

You get one ranked list, with each result showing why it matched.

Local-first

  • No cloud, no accounts.
  • The only outbound network call is the one-time embedding model download from Hugging Face (~47 MB, cached forever).
  • Pure JS + WASM — no native modules. Nothing to compile, nothing that breaks when Obsidian updates Electron.

What you get right now

  • A search modal on Cmd/Ctrl+Shift+F with live hybrid results and snippet highlights.
  • A “related notes” sidebar that updates as you move between notes.
  • Incremental indexing for creates, edits, renames, and deletes.
  • A settings tab where you can tune the weight of each retriever.
  • Zero configuration — install, enable, search.

Status

Early development, but stable enough for daily use on my own vault. There are still a few features I’d like to add, such as optional different model choices, connecting the local model with OLLAMA, etc.

What’s powering it

  • SQLite FTS5 — comes with sql.js for free, BM25 built in, battle-tested.
  • @huggingface/transformers v4 + onnxruntime-web — pure WASM inference. onnxruntime-node is native, so it’s out. Runs in a dedicated Web Worker so indexing
    never blocks flow.

A note on the model

Currently using sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (384 dims, ~47 MB quantized, 50+ languages). Based on my research it looks like
the most sensible fit right now — small enough for brute-force search to stay fast, multilingual out of the box, and trained specifically for semantic
similarity.

If you know a better model for this use case (local inference, multilingual, small footprint, sentence-level embeddings),
I’d genuinely love to hear it. Suggestions welcome.

What I’m looking for

I’ve currently applied to community plugin, but at this stage you can installing it from the repository into your own vault.

  • Try it on a real vault, especially big ones (5k+ notes).
  • Tell me when the ranking is weird. If the right note is on page two for an obvious query, that’s a bug.
  • Feature ideas that fit the “local-first, zero-config” spirit.
  • Code review and PRs welcome — AGENTS.md walks through the architecture end-to-end.

Break it, complain about it, or — best case — forget it’s there because it just works. Any of those counts as a win.

Thnx!

2 Likes

Sorry for forgetting to add this.
Here’s the repo: GitHub - erayaydn0/obsidian-vault-search: Hybrid semantic search plugin for Obsidian. BM25 + on-device vector embeddings + fuzzy titles, fused with Reciprocal Rank Fusion. Multilingual, zero-config, 100% local. · GitHub

At least remove obvious signs of vibe coding (eg., CLAUDE.md, AGENTS.md, etc) or most will not use it or trust it because it’s vibe coded

I developed the project using a spec-driven approach. I documented every part and every detail first, then implemented them as specs using AI. Wouldn’t it be absurd to call this Vibe coding?

Do you think the audience will value those details? Many people have an almost instinctive negative reaction to AI right now. I’m not questioning your knowledge or how you guide the AI; I’m just offering a different perspective on how others might see it.

It’s possible others might dismiss the work before noticing the effort you put into the details.

Yes, I agree with your thoughts on this. It’s very unclear to what extent we will accept AI. Naturally, people’s perspectives aren’t clear because all the rules are being turned upside down.
so here I wanted to show myself how things could become better and more systematic with the right use of AI. I want to eliminate some of the uncertainty and see how it can be used to my advantage. It’s exciting to hear people’s opinions while experiencing this.

Thanks for your feedback.