Too slow in vaults with many "attachments"

tomsb · January 24, 2022, 6:10pm

Both initial load (“Loading cache…”) and anything that involves searching (sidebar searches, [[ completion, query code fences, …) are unusably slow. This is on Windows with SSD storage, in a Dropbox with 2700 md files (nearly all, I think, less than 1000 lines, most only a dozen) and 570,000 files of any kind.

Most “large vault” discussions are about that 2700 number, and I think Obsidian is probably pretty well optimized there. This is great, since that’s probably the much harder problem to solve. However, it still does quite poorly as that second number grows. This should be easy to fix–just be more aggressive in ignoring irrelevant files! You don’t need file watches on no-markdown files, and even ten thousand MD files of the typical dozen-line size should be able to be loaded into memory for near-instant search (with no changes, I expect, to any of the existing architecture).

Just now, while writing this, I exited and re-started the app, and it took maybe 20 minutes to start. Thereafter, doing the query

tag:#Health

takes 16 minutes to start returning results, while running this script on Ubuntu as cd ~/Dropbox; time ffg md "#Health" takes about 1.25 seconds (and there are 48 results).

Please fix this. It’s a bug, and not even one that should require any major algorithmic thinking (despite the good examples elsewhere on state-of-the-art methods for fast text search). Just allow us to be selective in which files get scanned/watched/searched.

Of course, there are desktop search solutions that do handle that second number (like Windows’s start menu search), and look through all file names or even full-text search of all plain-text, PDF, Word, etc. files. I want to be clear that this is not what I’m proposing (and I think would only be something for much later in the roadmap, if ever).

Handling any kind of local file structure we might throw at it should be Obsidian’s greatest selling point. You should certainly try to compete with Roam etc. on fancy features, but you should first focus on getting your core competency down pat.

(I realize this is similar to this previous post of mine, but that one is marked as an improvement request, and I think now that this should be considered a bug.)

(Results above are with v0.13.19, installer v0.12.12.)

WhiteNoise · January 24, 2022, 6:20pm

The problem is that you propose ignoring the the attachments as a solution because you don’t really care about them. A good percentage of our users do care about them and we need to watch them.

I think it’s better we keep this an improvement request.

tomsb · January 24, 2022, 6:23pm

Also, I do have “options > files & links > detect all file extensions” turned off.

WhiteNoise · January 24, 2022, 6:26pm

In the other post, you said most of your non markdown files are images.

tomsb · January 24, 2022, 6:29pm

Many are also PDFs. I’m not sure where the search slowdown happens, but, when I search in a small test vault, I don’t get hits from PDF files.

WhiteNoise · January 24, 2022, 6:30pm

some things for PDFs. Even disabling “detect all file extensions”, obsidian still handles md, images, and pdfs.

tomsb · January 24, 2022, 6:33pm

I have 21,365 PDF files, 160,190 PNG, 33,252 JPG, and 2,830 GIF.

tomsb · January 24, 2022, 6:42pm

I don’t actually think the “ignore attachments” should need to be checked, either. Filtering even a million filenames in-memory on [[ shouldn’t be this slow, even with a brute-force method, like on every subsequent keypress, checking whether the query so far is in any one of those million strings. (Although you’d need to do something more intelligent than showing all the results in the pop-up!)

And are we doing full-text search on PDFs? Certainly we’re not doing search of OCR on images. So, only filenames (or maybe paths) should figure in.

tomsb · January 24, 2022, 6:54pm

I don’t think ignoring attachments is the right solution for this, no. If that’s what makes you label it as graveyard (I assume equivalent to wontfix), then please reconsider! Performance bugs are still bugs, and I and others have complained about this for a while now.

WhiteNoise · January 24, 2022, 7:13pm

I think there is a disconnect in what you think obsidian does and what it actually does.
The comparison with simple filesystem operations isn’t fair because obsidian is much more than that it’s integrated workspace manager. Same thing goes for the comparison with windows search.

We are aware that there are areas where there is room for improvement, and we’ll work on those, however we don’t expect that breaking past the 100k+ is gonna be easy (even with all optimizations).

tomsb · January 24, 2022, 7:22pm

Certainly, since it’s closed-source, I can only speculate on how things are done internally, based on how I would go about it myself. But I do believe that such scaling behavior comparisons with long-existing tools are fair, and so this should be considered an actual bug.

tomsb · January 24, 2022, 7:23pm

(Also, FWIW, I restarted now in safe mode, with no improvements to speed.)

tomsb · January 24, 2022, 7:32pm

Also also, I’m not comparing to “simple filesystem operations”, but to a pipeline of Linux tools (find, xargs, and grep, on an ext4 filesystem on a regular 7200RPM HDD) that does some of the same search tasks in an ugly way. It’s a fair comparison. (And this should be a starting point for comparison, not a final one! A 2-second delay per keypress when searching would be awkward, but workable.)

goahead97 · February 26, 2022, 1:48pm

I think I am experiencing the same problem currently. I tried by disabling all plugins but it did not fix it. Trying to open a note by means of a link in another note just takes too long, 1 minute or more. This has rendered Obsidian unusuable for me somehow. The search functionality is quick though.

finasia · March 22, 2022, 3:30am

Whether to allow users to ignore certain files？ like .gitignore