Maximum Number of Notes in Vault

Dor · November 12, 2020, 12:30am

Having an integrated index for all text brings many questions. The first is why suffer the overhead and restrictions of maintaining separate files, when you are already storing all of their content in the program?
Then comes the question of the overhead of maintaining and accessing the index. How would you apportion that between different functions and times?
And why do it at all purely for the use of a very few users (none yet as far as I know)
who have so many notes that they’re running into performance problems? Making all users pay the overhead.

WahWah · November 12, 2020, 12:33am

If the already-existing indexes on tags and other meta elements exist (again, I don’t know this for a fact, I was told on discord by an admin), wouldn’t there have to be just a single additional index – a ‘note body’ index – added? I feel you’re wishing to shade the difficulty toward the “totally more effort than it’s worth” direction; does that really have to be true?

Dor · November 12, 2020, 12:40am

They all had databases and algorithms. Google’s was faster from the beginning, which is why it grew so fast. They then shifted to a better database design which cemented it.
But actually much of the success wasn’t about the finding, it was about presenting and sequencing what was found. That’s why Google now doesn’t work as well as it used to. They try to prioritise advertisers, they have to combat SEO. There is more stuff to find, and they do find it pretty fast, but finding what you want from their results is harder because it us more likely to be pages deep.

WahWah · November 12, 2020, 12:41am

Not at all; in fact, quite the opposite. I’m here because databases box you in, require you to mold your brain and workflow to a rigid structure, and generally make you feel like a data entry clerk, not a creator, a conjurer of possibilities, and an architect of new relationships. I don’t want a database. I want something quick and scalable, that’s free-text and flexible. I don’t want ‘mandatory fields’, in the classic sense, but I do want a back-end invisible technology that will allow me rapid access to the most un-rememberable and unreachable ends of my note landscape. So I can quickly connect something from 7 1/2 years ago with something I just wrote, just by remembering a bare smidgen of detail, barely, and nothing else to go on. That requires full text search, because I don’t know where that smidgen was.

Dor · November 12, 2020, 12:48am

If you follow the Discord, you will also have noticed how often Licat talks about optimising by partial loading of the data.
As I said, at some point I would expect a plugin with an index (actually index implies database, so a plugin with a database); maybe just for search, maybe for other functions. When that happens, you’ll be able to measure the impact on general performance.

Dor · November 12, 2020, 12:55am

You have full text search now. An index is a database - you talked about just adding an extra column to the table. It’s grep against an index. There’s no free lunch: whichever approach you take there are some gains and some losses.

WahWah · November 12, 2020, 2:22pm

Yes, but it’s not indexed. That’s the whole point.

Metta · November 21, 2020, 4:16pm

Thanks for the heads up on this, @Dor. I am concerned about scalability and would love to know more about the vault organization strategies to which you are referring.

Can you recommend a good starting point for learning more?

Thanks in advance.

ryanjamurphy · November 21, 2020, 4:22pm

Probably see:

Metta · November 22, 2020, 1:53am

@ryanjamurphy ~ Many thanks, Ryan! Very much appreciate this helpful tip for learning more . . . .

Thank you, @Dor, for taking the time to provide this great overview.

Calhistorian · December 8, 2020, 3:29pm

Just to add to the discussion. I have over 9000 densely linked notes with little to no performance issues (late-2018 MBP, 8GB RAM, 256SSD, 2.3 GHz Quad-Core Intel Core i5). Only issues are initial load of vault graph view. Other than that no issues.

DEVONthink could certainly handly WAY more than this - hundreds of thousands, based just on files (though it is databased in ways).

WahWah · December 8, 2020, 8:37pm

Thanks for this update. I would like Obsidian to be able to handle the load that you mention devonthink handling. 9000 is great, certainly impressive, but in many other industries, still a very low number. Would be absolutely lovely to see attention and resources temporarily diverted from the nifty-cool-shiny-handy-smart feature updates, to just a little — just a little-- time toward adding a robust, scalable, industrial-strength indexing (and other) under-the-hood technology to let the thing shine at much higher numbers of notes. If we can see future-proofing at 1mm to 5mm notes, then noone will have to even bat an eyelash at 200k… You want loads more buy-in from corporate customers for enterprise licenses and tons of sync subscriptions? Make these changes, I suggest.

Calhistorian · December 8, 2020, 9:28pm

You’re definitely not wrong there. I would imagine that would be dev in tandem with native applications. Because electron would be too intense on resources. Different project, but look at TheBrain 1mm+ notes are feasible.

argentum · December 8, 2020, 10:28pm

My two cents on this: The devs consistently spend time optimizing for “large” vaults, which is why the app almost doesn’t feel like an electron app. Given that it’s in beta, I would argue that optimizing now for millions of notes falls in premature optimization.

There are other features that are important for the sustainability of the app and other features that would make the app even better for the vault sizes we’re seeing today.

I’m not saying that this is not important, but it might be good to adjust your expectations on this. I’m sure it will come in time! If you do have some performance requests with large vaults, I’m sure the devs will appreciate more data points!

WahWah · December 10, 2020, 7:40pm

Well, not what I wanted to hear, but it is what it is, and so I thank you for your input.

I would venture to think that you’re not seeing vault sizes that big, because the product is so new, and noone has built up a 5mm note vault. But – not because they are not needed or wanted, that’s not necessarily the case, I believe.

In the (hopefully short) interim between now, and screaming hot performance at vault-wide full text searching at 2mm notes, is a workaround to just split the vault? Ten vaults with 25k notes each? But then these cannot be searched all at once, right?

ksandvik · December 12, 2020, 6:25pm

ripgrep is a good example of how a highly optimised tool (multi-threading) could search and find results from hundreds of thousands of files within seconds. Visual Studio Code is also using ripgrep as their default search tool. The tool is isolated in the VSC code so there’s the option for the Obsidian team to integrate it in.

Most other indexing tools and systems have to deal with binary data, as Obsidian only uses text files the search solution does not require indexing.

argentum · December 13, 2020, 2:08pm

Well I’ve heard about vaults of varying sizes, anywhere between 200 to 40k notes. I believe there is one bible vault with around 80k notes too. You might want to ask some of those users directly in the discord server about search or how they manage their large vaults.

Jasen · September 5, 2021, 4:01pm

Question of atomicity

kallas · September 7, 2021, 6:37pm

My first thought is that you may want to try out DevonThink, which is a knowledge management software that may serve this purpose well. Basically I use DevonThink to collect everything, and use Obsidian to organize my own thoughts/notes.

The best part is, I can add the Obsidian vault to DevonThink, so it can interact with “raw inputs” without any headache.

santi · September 20, 2021, 2:47pm

that definitely sounds like an awesome worklow, seems like devonthink is mac only. Do you perhaps know of any alternatives for us Windows / Linux users?