Local Semantic “Knowledge Indexer” (Opt-In Core Layer) + Optional MCP Service

rmccorkl · November 5, 2025, 4:35pm

Use case or problem

As vaults grow, the challenge shifts from storing knowledge to understanding relationships between notes. Obsidian currently excels at representing explicit links, but cannot illicit semantic relationships dynamically (shared meaning, themes, overlapping concepts).

Users and plugin developers are already trying to solve this, but in fragmented ways:

Multiple plugins each generate their own embedding indexes → duplicated computation and inconsistent similarity scoring
Some users manually paste notes into external AI tools → this leaks private vault content
Local LLM / CLI workflows interact directly with .md the vaults md files → but every tool must re-implement its own semantic reasoning

The absence of a shared local semantic foundation has become a practical limitation, not a philosophical advantage.

Proposed solution

Introduce an opt-in, local-only, privacy-preserving semantic indexing service in core:

Runs entirely offline
Uses a small local embedding model (e.g., ONNX / sentence-transformers)
Maintains a vector similarity index (SQLite + sqlite-vss, LanceDB, etc.)
Re-embeds only notes that change
Exposes a small, stable API (example):

ai.getRelatedNotes(notePath, { topK: 5 })
ai.searchContext(query, { topK: 10 })
ai.getEmbedding(notePath)

This is a foundational layer, not a UI feature and not an AI assistant.

Design principle:

Core determines which notes are related.
Plugins determine what to do with those relationships.

This enables the community plugin ecosystem to build meaning-based functionality rather than repeatedly reinventing embedding/storage logic.

Examples of plugin workflows this unlocks:

Suggested backlinks and missing link discovery
Semantic graph overlays (graph shows why notes relate)
Concept / topic / theme clustering
Knowledge synthesis and research insight tools
Journaling reflection & idea evolution summaries
Study and spaced-repetition tools guided by conceptual proximity

All while keeping:

The vault as plain Markdown files
All computation local
Privacy fully intact

Optional extension

Expose the semantic index through a local MCP service (obsidian://mcp), so local LLMs and agent frameworks (e.g., Claude Desktop MCP, LM Studio, Ollama, Speckit) can reason over the vault without exporting notes.

Current workaround (optional)

Existing AI and smart-linking plugins each maintain their own embedding cache
Users manually copy/paste note text into cloud AI tools
Developer scripts and CLIs already treat the vault as a dataset, but without any shared semantic index

These workarounds are:

Redundant
Inconsistent
Often less private than having the semantic layer local and shared.

oceanbyte · November 15, 2025, 3:19pm

Strong +1 for this proposal.

I’ve built a plugin that does exactly this: obsidian-blue-notes. The development experience made it painfully clear why this needs to be in core.

The reality of building semantic features today:

Before I could work on any of the actual features users wanted, I had to spend weeks building the entire embedding infrastructure from scratch. The same infrastructure dozens of other developers have already built. I had to:

Choose and integrate an embedding model(s)
Build a vector storage system
Implement incremental re-indexing logic
Handle edge cases around note updates, deletions, and renames
Optimize for performance and memory usage
Debug inconsistent results across different vault sizes
Deal with cross-platform compatibility issues

Every hour spent on this was an hour not spent building the semantic discovery features that users actually care about. Worse, I know other developers are solving these exact same problems right now, independently, with slightly different approaches that won’t work together.

The plugin ecosystem is being held back:

When I want to add a new feature now, I first have to ask: “Does this justify maintaining my own embedding system?” For many good ideas, the answer is no. The infrastructure overhead isn’t worth it. But if there was a shared semantic layer in core, those features would suddenly become viable.

Other developers are making the same calculation, which means we’re collectively losing out on innovation because the foundation is too expensive to build.

IxxSxxA · December 3, 2025, 5:13am

I completely understand the need you’re describing.
I’ve built my own semantic-related plugin as well (only now realizing it overlaps with Blue Notes / Similar Notes).
But that’s exactly why I see the problem differently.

Right now we already have dozens of plugins doing almost the same thing — semantic search, related notes, embeddings, smart backlinks, clusters, etc.

If duplication alone justified moving things into core, then by that logic we should also integrate:

all graph enhancements
all search plugins
all smart-linking tools
all spaced-repetition tools
all journaling intelligence
all AI assistants

Because they “duplicate work.”

But that would obviously be wrong.
Duplication in the plugin ecosystem is not a bug — it’s how innovation happens.

**The real issue isn’t that Obsidian lacks a semantic layer.

It’s that the AI ecosystem itself has no shared standard.**

This is why every plugin author currently reinvents:

embeddings model choice
vector storage
re-index logic
metadata architecture
cache format
provider abstraction

But that’s not something Obsidian can or should solve in its core.

This would be like saying:

“Google Cloud and AWS both do very similar things—why don’t they agree on a single unified infrastructure?”

It doesn’t work that way, because the layer is too complex, fast-moving, and opinionated.

Centralizing AI inside Obsidian’s core creates far bigger problems:

1. Where do you store the semantic index?

in Obsidian’s cache? (duplicated, unsynced, fragile)
inside the vault? (Git pollution, conflicts, mobile issues)
external folder? (permissions, portability problems)

There is no solution that fits all users.

2. Which embedding model do you pick?

Whatever the core team chooses will disappoint MANY users:

too big
too small
not multilingual
not accurate enough
outdated in 6 months
incompatible with mobile

There is no single model that satisfies everyone.

3. Cross-platform reality

Running semantic embeddings consistently across:

macOS
Windows
Linux
iOS
Android

…is nearly impossible without constant breakage.
It would turn Obsidian core into an AI runtime — which is the opposite of “minimal.”

4. Maintenance nightmare

Keeping an AI layer updated, efficient, stable, and universal would become a full-time job.
And AI evolves every 3–6 months.
Shipping that level of heavy infrastructure in the core would be a burden forever.

**The right solution is not centralization — it’s standardization outside the core.**

Obsidian shouldn’t define an AI layer.

The AI providers and runtimes should.

Just like:

browsers didn’t invent TLS
editors didn’t invent PDF
Notepad doesn’t define filesystem formats
VSCode doesn’t define programming languages

Obsidian should not define a universal semantic index or embedding layer.
That should be handled by AI standards such as MCP, ONNX, local LLM engines, or a shared community library.

In short:

The need is real.
The pain is real.
But moving this into the Obsidian core is the wrong solution.
A shared external standard (plugin or library) could solves the problem
without compromising Obsidian’s minimalism, portability, or philosophy.

AI standardization should come from the AI ecosystem — **not from Obsidian.
**
Obsidian can’t solve a problem that belongs to the entire AI ecosystem. To me what you say is like asking Microsoft Word to invent HTTP.
Standards don’t come from apps — apps adopt them once they exist.
AI evolves too fast, varies too much, and depends on too many models and providers for Obsidian to centralize it in its core.
The only realistic path is a shared community plugin or library, not turning Obsidian into an AI platform.