Local Semantic “Knowledge Indexer” (Opt-In Core Layer) + Optional MCP Service

Use case or problem

As vaults grow, the challenge shifts from storing knowledge to understanding relationships between notes. Obsidian currently excels at representing explicit links, but cannot illicit semantic relationships dynamically (shared meaning, themes, overlapping concepts).

Users and plugin developers are already trying to solve this, but in fragmented ways:

  • Multiple plugins each generate their own embedding indexes → duplicated computation and inconsistent similarity scoring
  • Some users manually paste notes into external AI tools → this leaks private vault content
  • Local LLM / CLI workflows interact directly with .md the vaults md files → but every tool must re-implement its own semantic reasoning

The absence of a shared local semantic foundation has become a practical limitation, not a philosophical advantage.


Proposed solution

Introduce an opt-in, local-only, privacy-preserving semantic indexing service in core:

  • Runs entirely offline

  • Uses a small local embedding model (e.g., ONNX / sentence-transformers)

  • Maintains a vector similarity index (SQLite + sqlite-vss, LanceDB, etc.)

  • Re-embeds only notes that change

  • Exposes a small, stable API (example):

    ai.getRelatedNotes(notePath, { topK: 5 })
    ai.searchContext(query, { topK: 10 })
    ai.getEmbedding(notePath)

This is a foundational layer, not a UI feature and not an AI assistant.

Design principle:

  • Core determines which notes are related.
  • Plugins determine what to do with those relationships.

This enables the community plugin ecosystem to build meaning-based functionality rather than repeatedly reinventing embedding/storage logic.

Examples of plugin workflows this unlocks:

  • Suggested backlinks and missing link discovery
  • Semantic graph overlays (graph shows why notes relate)
  • Concept / topic / theme clustering
  • Knowledge synthesis and research insight tools
  • Journaling reflection & idea evolution summaries
  • Study and spaced-repetition tools guided by conceptual proximity

All while keeping:

  • The vault as plain Markdown files
  • All computation local
  • Privacy fully intact

Optional extension

Expose the semantic index through a local MCP service (obsidian://mcp), so local LLMs and agent frameworks (e.g., Claude Desktop MCP, LM Studio, Ollama, Speckit) can reason over the vault without exporting notes.


Current workaround (optional)

  • Existing AI and smart-linking plugins each maintain their own embedding cache
  • Users manually copy/paste note text into cloud AI tools
  • Developer scripts and CLIs already treat the vault as a dataset, but without any shared semantic index

These workarounds are:

  • Redundant
  • Inconsistent
  • Often less private than having the semantic layer local and shared.
3 Likes

Strong +1 for this proposal.

I’ve built a plugin that does exactly this: obsidian-blue-notes. The development experience made it painfully clear why this needs to be in core.

The reality of building semantic features today:

Before I could work on any of the actual features users wanted, I had to spend weeks building the entire embedding infrastructure from scratch. The same infrastructure dozens of other developers have already built. I had to:

  • Choose and integrate an embedding model(s)
  • Build a vector storage system
  • Implement incremental re-indexing logic
  • Handle edge cases around note updates, deletions, and renames
  • Optimize for performance and memory usage
  • Debug inconsistent results across different vault sizes
  • Deal with cross-platform compatibility issues

Every hour spent on this was an hour not spent building the semantic discovery features that users actually care about. Worse, I know other developers are solving these exact same problems right now, independently, with slightly different approaches that won’t work together.

The plugin ecosystem is being held back:

When I want to add a new feature now, I first have to ask: “Does this justify maintaining my own embedding system?” For many good ideas, the answer is no. The infrastructure overhead isn’t worth it. But if there was a shared semantic layer in core, those features would suddenly become viable.

Other developers are making the same calculation, which means we’re collectively losing out on innovation because the foundation is too expensive to build.

2 Likes

I completely understand the need you’re describing.
I’ve built my own semantic-related plugin as well (only now realizing it overlaps with Blue Notes / Similar Notes).
But that’s exactly why I see the problem differently.

Right now we already have dozens of plugins doing almost the same thing — semantic search, related notes, embeddings, smart backlinks, clusters, etc.

If duplication alone justified moving things into core, then by that logic we should also integrate:

  • all graph enhancements

  • all search plugins

  • all smart-linking tools

  • all spaced-repetition tools

  • all journaling intelligence

  • all AI assistants

Because they “duplicate work.”

But that would obviously be wrong.
Duplication in the plugin ecosystem is not a bug — it’s how innovation happens.


**The real issue isn’t that Obsidian lacks a semantic layer.

It’s that the AI ecosystem itself has no shared standard.**

This is why every plugin author currently reinvents:

  • embeddings model choice

  • vector storage

  • re-index logic

  • metadata architecture

  • cache format

  • provider abstraction

But that’s not something Obsidian can or should solve in its core.

This would be like saying:

“Google Cloud and AWS both do very similar things—why don’t they agree on a single unified infrastructure?”

It doesn’t work that way, because the layer is too complex, fast-moving, and opinionated.


Centralizing AI inside Obsidian’s core creates far bigger problems:

1. Where do you store the semantic index?

  • in Obsidian’s cache? (duplicated, unsynced, fragile)

  • inside the vault? (Git pollution, conflicts, mobile issues)

  • external folder? (permissions, portability problems)

There is no solution that fits all users.

2. Which embedding model do you pick?

Whatever the core team chooses will disappoint MANY users:

  • too big

  • too small

  • not multilingual

  • not accurate enough

  • outdated in 6 months

  • incompatible with mobile

There is no single model that satisfies everyone.

3. Cross-platform reality

Running semantic embeddings consistently across:

  • macOS

  • Windows

  • Linux

  • iOS

  • Android

…is nearly impossible without constant breakage.
It would turn Obsidian core into an AI runtime — which is the opposite of “minimal.”

4. Maintenance nightmare

Keeping an AI layer updated, efficient, stable, and universal would become a full-time job.
And AI evolves every 3–6 months.
Shipping that level of heavy infrastructure in the core would be a burden forever.


The right solution is not centralization — it’s standardization outside the core.

Obsidian shouldn’t define an AI layer.

The AI providers and runtimes should.

Just like:

  • browsers didn’t invent TLS

  • editors didn’t invent PDF

  • Notepad doesn’t define filesystem formats

  • VSCode doesn’t define programming languages

Obsidian should not define a universal semantic index or embedding layer.
That should be handled by AI standards such as MCP, ONNX, local LLM engines, or a shared community library.


In short:

:check_mark: The need is real.
:check_mark: The pain is real.
:cross_mark: But moving this into the Obsidian core is the wrong solution.
:check_mark: A shared external standard (plugin or library) could solves the problem
without compromising Obsidian’s minimalism, portability, or philosophy.

AI standardization should come from the AI ecosystem — **not from Obsidian.
**
Obsidian can’t solve a problem that belongs to the entire AI ecosystem. To me what you say is like asking Microsoft Word to invent HTTP.
Standards don’t come from apps — apps adopt them once they exist.
AI evolves too fast, varies too much, and depends on too many models and providers for Obsidian to centralize it in its core.
The only realistic path is a shared community plugin or library, not turning Obsidian into an AI platform.