Disclaimer
Is this project open source? Yes
Is this project completely free? Yes
Is this project vibe-coded beyond the author’s ability to comprehend how it works? No
Community Directory:obsidian.md/plugins?id=vault-curate
Source:github.com/notoriouslab/vault-curate
Background
I’ve been writing in Obsidian primarily in Traditional Chinese for about 3 years. Built-in search works fine when the thing I’m looking for is a unique English keyword. It misses on Chinese proper nouns, on private vocabulary that only appears in my own notes, and on domain-specific terms.
There’s a second problem that crept up over time, which has nothing to do with language: I have hundreds of notes I no longer remember writing. Regular search doesn’t help with this, because I don’t know what to search for.
Vault Curate is my attempt at both.
Search
Semantic search using bge-small-zh-v1.5 on WebGPU, falling back to WASM if WebGPU isn’t available. Fused with BM25 and a Jaro-Winkler fuzzy match via Reciprocal Rank Fusion. Indexing my ~340-note vault (~5,000 chunks) takes about 1m23s on an M2 Air the first time; after that it’s incremental on save.
The model is small (~98MB) but it’s actually trained on Chinese, not a distilled multilingual encoder. That’s the part that matters for CJK content — multilingual models work, but they’re noticeably weaker on personal vocabulary and proper nouns because their pre-training corpus is mostly English.
If you write Japanese, Korean, or mixed languages, bge-ja, bge-ko, and the multilingual bge-m3 are all swappable in settings.
Discover
At index time, each note is tagged Hot or Cold. Hot = recently created or has links going in and out. Cold = old, isolated, no links anywhere. The threshold is one setting (hotDays, default 90).
The Discover sidebar has two modes. In the first, when you have a note open, it shows semantically similar notes from your whole vault — but Cold notes get a ranking boost. The idea is that you don’t need to be reminded of what you wrote yesterday; you need to be reminded of the thing from two years ago that you’ve already forgotten.
The second mode is a global view that ranks Cold notes by how similar they are to your current Hot pool. Roughly: “stuff you used to care about that’s still relevant to what you’re working on now.”
This is just cosine similarity plus a re-ranking pass. No LLM involved.
LLM features (separate, off by default)
A toggle, off on first install, exposes two LLM-driven commands: auto-generating a one-line description for each note, and generating topic-grouped Map-of-Content files. Bring your own provider — local Ollama (I’ve been using qwen3 and gemma3) or any OpenAI-compatible endpoint works. Nothing makes a network call unless you turn this on and point it somewhere.
Search and Discover never call out. They run entirely on your machine.
Install
Community plugin store → search “Vault Curate”.
I’d appreciate feedback particularly from other CJK writers (does it surface things built-in search misses for you?), anyone running a 10k+ note vault (I haven’t stress-tested at that scale), and anyone willing to play with the hotDays threshold and tell me whether 90 feels right for how you actually use your vault.
Thanks for reading.
