Preventing Data Leaks in Local Graph RAG: A Subset Architecture & Eval Tool

Hi everyone,

I’ve been experimenting with Local Graph RAG for my Obsidian vault, but I hit a major privacy issue: **Boundary Leaks**.

When you dump your entire vault into a vector/graph database, the LLM sometimes hallucinates and mixes contexts. For example, it might use sensitive info from a `Personal/Finance` folder to answer a query about a `Work/Project`.

To solve this, I designed what I call the **Subset Architecture**.
Instead of one giant graph, the data is isolated into specific subsets (e.g., HR, IT, Sales, or Personal vs. Work) with a shared “Core” context to prevent cross-contamination.

*(Drag and drop your Architecture Diagram image here)*

**The Problem: How do we test if this actually works?**
Evaluating if the LLM respects these boundaries locally is surprisingly hard. Most eval tools out there are bloated or require sending your private vault data to OpenAI.

So, I built **RAG-Destroyer** — a lightweight, 100% local, open-source evaluation tool specifically designed to test Graph RAG pipelines for boundary leaks and context accuracy.

I’m not selling anything, just wanted to leave this here for any plugin developers or local AI tinkerers who might be facing the same issue. If you are building a local RAG plugin for Obsidian, feel free to steal this architecture or use the eval tool to benchmark your system.

Since I’m a new user, the forum won’t let me post links.
If you want to check out the code or read the full architecture breakdown:

  • GitHub: Search for RAG-Destroyer or my username tong-mini-mac
  • Dev.to: Search for the article titled "I was tired of complex RAG evaluation tools, so I built my own (and open-sourced it)"

Would love to hear how other developers are handling data boundaries and local RAG evaluations in their vaults!