Building a 9000+pdf database to integrate with my LLM. Obsidian crashes. Any solutions?

What I’m trying to do

Unable to get it Obsidian to index all. Let alone try Smart Connections. Disabled Graph view, sync, file recovery. My PC is RTX3090, 128GB DDR4 RAM, Ryzen 5900XT, 1TB HDD.

Things I have tried

Import in 1000 batches, tried disable core plugins. One by one. Still failed.

When I did similar things I always symlinked the top folder of all PDFs to the vault, never brought the physical pdfs into the vault as syncing the vault would have been impossible after.
I remember little freezes but even on a smaller laptop 30k+ files were indexed.

I reckon there is some issue with one particular path (folder) or file.

But I would NOT recommend going down this route at all.

Add all your pdfs to Zotero, it will index all your files and run LLM models on the txt files, which you can turn into md files as well with Python or whatever. You can use other software like CursorAI for this (import workspace with your indexed files: NOT PDFs, but raw txt/md).

You do NOT want to use Obsidian for all things.

Alternatively, use Cursor AI to regex search your stuff for common topics and only pick those index files or pdfs to add to Obsidian, to a dedicated project vault.

I am currently trying out Neural Composer for a smaller batch of books (md-index files because I have them, but I could try original PDFs) to see what gives.

2 Likes

Hi @mystvearn!

With that RTX 3090 and 128GB RAM, your machine is an absolute beast for local AI. You should be flying, not crashing!

The crash likely happens because standard Obsidian plugins run the indexing logic (reading files + calculating embeddings) inside the main Obsidian process (Electron/JavaScript) or try to store the vector index as thousands of small files inside the vault. On a 1TB HDD (mechanical drive), the I/O latency of reading/writing thousands of small cache files + the memory overhead of indexing 9000 PDFs will choke Obsidian to death.

The Solution: Decoupled Architecture
You need a system that runs the heavy lifting (indexing/embedding) in a separate process, not inside Obsidian’s UI thread.

As @Sunnaq445 mentioned, Neural Composer (which uses LightRAG) handles this differently:

  1. It spawns a separate Python server.

  2. The heavy indexing work happens in that Python process (utilizing your RTX 3090 via CUDA).

  3. Obsidian stays lightweight because it just queries the database; it doesn’t hold the index in RAM.

Recommendation for your setup:
Even if you don’t use my plugin, try to move the embedding workload outside of Obsidian (using external scripts or servers). But if you want to keep the UX inside Obsidian, Neural Composer was built exactly for this “heavy local hardware” scenario.

Tip regarding the HDD: Since you are on a mechanical drive, set MAX_PARALLEL_INSERT=1 in the plugin settings (Review .env) to avoid killing your disk seek time during ingestion.

Hope you can put that 3090 to good use!