Automated Knowledge Graphs with Cognee

janshi · December 8, 2025, 10:45pm

A year ago, I wrote about my Obsidian struggles after 2.5 years: hundreds of unconnected notes, lots of thought went into creating the right structure but ultimately manual “gardening” never happened.

A diagnosis, by gwern:

[…] any note-taking, personal knowledge management, or personal wiki system is inherently limited by the fact that they require a lot of work for what is, for most people, little gain. For most people, trying to track all of this stuff is as useful as exact itemized grocery store receipts from 5 years ago.

Most people simply have no need for lots of half-formed ideas, random lists of research papers, and so on. This is what people always miss about Zettelkasten: are you writing a book? Are you a historian or German scholar? Do you publish a dozen papers a year? No? Then why do you think you need a Zettelkasten? If you are going to be pulling out a decent chunk of those references for an essay or something, possibly decades from now, then it can be worth the upfront cost of entering references into your system, knowing that you’ll never use most of them and the benefit is mostly from the long tail, and you will, in the natural course of usage, periodically look over them to foster serendipity & creativity; if you aren’t writing all that, then there’s no long tail, no real benefit, no intrinsic review & serendipity, and it’s just a massive time & energy sink. Eventually, the user abandons it… and their life gets better.

Further, these systems are inherently passive, and force people to become secretaries, typists, reference librarians, archivists, & writers simply to keep it from rotting (quite aside from any mere software issue), to keep it up to date, revise tenses or references, fix spelling errors, deal with link rot, and so on. (Surprisingly, most people do not find that enjoyable.) There is no intelligence in such systems, and they don’t do anything. The user still has to do all the thinking, and it adds on a lot of thinking overhead.

That’s me! So what do I actually want to do instead? Gwern, again:

“I’m trying to write more about what is not recorded. Things like preferences and desires and evaluations and judgments. Things that an AI could not replace even in principle.”

My browsing history, emails, chat logs, calendar, Readwise and Zotero highlights — this should form the “digital brain” automatically. I should only write content AI can’t deduce: my ideas, preferences, emotions, the “why” behind decisions. These days most of what I write, I write in a chat conversations with Claude. I don’t like the idea of locking myself into one provider and giving them my data, but since I have enabled Claude’s memory feature I have already had more personal insights than 3+ years of “linking my thinking” with Obsidian.

Architecture: Cognee as Memory Engine

What is Cognee? Open-source knowledge graph engine that ingests documents (30+ formats), extracts entities/relationships via LLMs, and maintains a queryable graph combining vector embeddings, with a graph and structured metadata.

Current Stack:

Cognee - core memory/graph engine
Goose CLI - local AI agent (think open source Claude Code)
LM Studio + Qwen 3 - local inference (MLX on Apple Silicon)
100% local-first and offline - no personal data leaves machine

Data Flow:

Personal data → Cognee → AI agents conduct user research

This gets you already more than typical “chat with your vault” RAG plugins do because its graph structure and relational database enables reasoning about connections, dependencies, and multi-hop relationships that pure vector similarity misses.

What if we’d export that graph’s node entities and link them together? We get an entire vault for free.

Input: ~400 URLs from the reference section of a Wikipedia article
Output: Fully linked Obsidian knowledge graph published via Quartz
Repo: https://github.com/jancbeck/quartz

For comparison: This is what cognee’s graph looks like after ingesting the data, before exporting it.

Key difference from existing tools: This generates the vault structure itself (entities, relationships, cross-links) rather than just adding chat interface to existing notes.

Caveats

Setting up Cognee requires Docker/Python knowledge (not as simple as installing Obsidian)
Local LLMs are slower than API’s (M1 Pro ~30-40 tokens/sec with Qwen 3). GPU rental is faster but less private (still not same as sending data to OpenAI)
Currently no out-of-the-box way to tie node edges to input sources (no “How do you know these entities are related?“)
No de-duplication or merging of concepts
Prone to LLM confabulations when using natural language queries
Once content is in Cognee, no easy update mechanism currently

Outlook

Gwern’s prediction:

“You need to rethink the entire system from the ground up on the basis of making neural nets do as much as possible… It would be better to start with a clean sheet.”

For me, the question is no longer “how do I organize my vault?” but “what should I write that AI can’t deduce from my digital exhaust?”. I’m not sure it’s going to be cognee or another tool but I think the days where I’m linking my thinking are over.

Kepano mentioned in his recent interview that Obsidian won’t rush to add AI, prioritizing privacy and user agency. That’s valid. But for those willing to self-host and run local inference, automated private knowledge graph maintenance is already possible.

The vault becomes ambient infrastructure. The human focuses on writing irreplaceable “ice cream” content.

“Ice cream value scale”

Score	Type	Example
10	Core Identity	“I realized my fear of commitment stems from my parents’ divorce at age 8”
9	Deep Preference	“I love Gothic games because figuring out the world IS the game — no hand-holding”
8	Relational Insight	“Thomas deflects money talk — tied to his father’s bankruptcy”
7	Original Idea	Game design doc, business concept with unique angle
5	Stated Preference	Movie ratings, “I prefer dark mode”
3	Imported Signal	Readwise highlights (signals interest but not original)
1	Reconstructible	Generic content any LLM could produce

arthurocrome · December 9, 2025, 9:42am

Really like your inputs. I just read through this thread and “2.5 years of using Obsidian and looking ahead”. I have more or less spent 2 years taking note on Obsidian. I think you need a bit of time on it to be able to take a step back on note taking.

gwern quotes and your own thinking process are particularly insightful. Thanks for sharing this!

amit · January 7, 2026, 1:54pm

nice one @janshi. Could you give us an example of the ontology that it creates?