Vault Curate - search + discover plugin built around CJK notes

Disclaimer

Is this project open source? Yes
Is this project completely free? Yes
Is this project vibe-coded beyond the author’s ability to comprehend how it works? No
Community Directory: obsidian.md/plugins?id=vault-curate
Source: github.com/notoriouslab/vault-curate

Background

I’ve been writing in Obsidian primarily in Traditional Chinese for about 3 years. Built-in search works fine when the thing I’m looking for is a unique English keyword. It misses on Chinese proper nouns, on private vocabulary that only appears in my own notes, and on domain-specific terms.

There’s a second problem that crept up over time, which has nothing to do with language: I have hundreds of notes I no longer remember writing. Regular search doesn’t help with this, because I don’t know what to search for.

Vault Curate is my attempt at both.

Search

Semantic search using bge-small-zh-v1.5 on WebGPU, falling back to WASM if WebGPU isn’t available. Fused with BM25 and a Jaro-Winkler fuzzy match via Reciprocal Rank Fusion. Indexing my ~340-note vault (~5,000 chunks) takes about 1m23s on an M2 Air the first time; after that it’s incremental on save.

The model is small (~98MB) but it’s actually trained on Chinese, not a distilled multilingual encoder. That’s the part that matters for CJK content — multilingual models work, but they’re noticeably weaker on personal vocabulary and proper nouns because their pre-training corpus is mostly English.

If you write Japanese, Korean, or mixed languages, bge-ja, bge-ko, and the multilingual bge-m3 are all swappable in settings.

Discover

At index time, each note is tagged Hot or Cold. Hot = recently created or has links going in and out. Cold = old, isolated, no links anywhere. The threshold is one setting (hotDays, default 90).

The Discover sidebar has two modes. In the first, when you have a note open, it shows semantically similar notes from your whole vault — but Cold notes get a ranking boost. The idea is that you don’t need to be reminded of what you wrote yesterday; you need to be reminded of the thing from two years ago that you’ve already forgotten.

The second mode is a global view that ranks Cold notes by how similar they are to your current Hot pool. Roughly: “stuff you used to care about that’s still relevant to what you’re working on now.”

This is just cosine similarity plus a re-ranking pass. No LLM involved.

LLM features (separate, off by default)

A toggle, off on first install, exposes two LLM-driven commands: auto-generating a one-line description for each note, and generating topic-grouped Map-of-Content files. Bring your own provider — local Ollama (I’ve been using qwen3 and gemma3) or any OpenAI-compatible endpoint works. Nothing makes a network call unless you turn this on and point it somewhere.

Search and Discover never call out. They run entirely on your machine.

Install

Community plugin store → search “Vault Curate”.

I’d appreciate feedback particularly from other CJK writers (does it surface things built-in search misses for you?), anyone running a 10k+ note vault (I haven’t stress-tested at that scale), and anyone willing to play with the hotDays threshold and tell me whether 90 feels right for how you actually use your vault.

Thanks for reading.


v1.0.3 Quick update

  • Hover preview: Cmd/Ctrl + hover on Search/Discover results now opens
    Obsidian’s native Page Preview popup, same as wikilinks.
  • Pin Discover: :pushpin: button on the Discover sidebar to lock the current
    note. Click through to peek at results without losing your discovery
    context. Auto-unpins on delete or Global mode switch.
  • Fix: LLM model dropdown was stuck on “Loading…” when using the
    built-in WebGPU embedding + AI curation. Now correctly populates
    from Ollama.
  • Resolved Dashboard “Direct Filesystem Access” warning by stripping
    the dead-code node:fs path from the bundled sql.js Emscripten
    output. Runtime behaviour unchanged.
  • README now includes an “Audit disclosures” section explaining the
    remaining recommendations (vault enumeration is necessary for the
    index; bundled transformers.js uses new Function for its method
    dispatcher — plugin code itself has zero eval / new Function).