File truncated on Obsidian crash (out-of-memory from reindexing triggered by typing)

Yurcee · October 23, 2024, 12:23pm

Yes, I seem to remember that, now that you mention it.
I was just thinking some kink (interference) happens in spite of yourself.
But I do understand you are testing with vanilla Obsidian as well.

Having said that, all my problems seemed to resolve themselves after disabling offending plugins (Remember Cursor Position and Style Settings) and putting my hefty (35-55kbs – so that’s as hefty as it gets) files consisting of markdown tables (Obsidian doesn’t like files with large tables in them) outside my main vault into another leaner vault and 1.7.4 Obsidian seems to further smoothen the experience.

clehene · October 23, 2024, 4:18pm

@Yurcee - yes, stopped using Style Settings a while ago. I’ll check tables too, but I don’t use them heavily. Indexing though seems to be a big problem, as it seems to degrade incrementally with the file size with 1000:1 ratio. I wonder if things wouldn’t overall get snappy otherwise. Unfortunately disabling search from core plugins doesn’t disable the indexing.

@WhiteNoise

It can be reproduced with a 5-6MB generated file using the sandbox vault “Start here.md” document with its content multiplied.

Let me know if you want me to generate one. What’s more interesting IMO is that the memory consumption seems to be superlinear

At 1.7MB it uses 639MB
At 3.4MB it uses 2400MB - i.e. 2x file growth uses 4x more RAM

Yurcee · October 23, 2024, 4:54pm

The only time I’ve seen Indexing taking long notice on files that had long tables in them or back in the day when I didn’t have tables but Obsidian must have been struggling sorting itself out with some other process.

The 35-55kb files with tables are the trimmed result of 2.5-4.5MB files with tables where I had enormous amounts of links, so I had a script written to remove all links, but with the truncated smaller files Obsidian still showed me the notice once in a while.

It goes without saying the trimming was necessitated due to performance issues.

One more thing I noticed: Dataview rendered tables are not as hard on Obsidian as markdown tables. So people are better off querying their tables data with Dataview (even with dv.io.load) than actually having them rendered (although I don’t want all my tables rendered out again, only relevant portions).

WhiteNoise · October 23, 2024, 11:59pm

We did some investigation. There is no problem with regards to processing speed (at least with respect to Start Here.md).
However, we noticed a large memory consumption within the parser used for indexing files (remark).
The problem is not remark itself, but an optimization done by v8 to speed up the processing of substrings. This optimization has the side-effect of increasing the memory consumption (link).

If it does not introduce other problems, in v1.8, we will attempt to work around this v8 optimization to reduce the memory utilization.

clehene · October 24, 2024, 5:47pm

@WhiteNoise thank you - I’ve modified Start Here.md to match these sizes

Also, as per @yurcee’s note I tried with and without a large table and there seems to be a noticeable penalty from tables.

Start Here 1.md.zip (18.1 KB)
this doesn’t include the table, but pasting a 130x8 table seems to affect it.

As a side note, it would be interesting to test out a build without pointer compression. It would likely blow out the RAM use and only improve crashes, not latency, but could be interesting

I’m not familiar with the markdown parsers, no idea how swappable these, are but noticed that marked seems to lead on performance, and large files marked vs markdown-it vs remark vs turndown vs remarkable vs showdown: Which is Better Markdown Parsing and Conversion Libraries?

A performance benchmark that’d be relevant would be one that correlates the frequency and sizes for certain elements like links, code blocks, tables, headings, etc.
A synthetic test that would measure each independently would be the best way to check.

Editor latency / FPS is critical for user experience (see Zed’s philosophy Optimizing the Metal pipeline to maintain 120 FPS in GPUI) - I think tracking latency/FPS would be a good indicator of UX quality. It arguably mattersmore than startup time, assuming people type more times than start/restart.
As indexing seems to be a big factor in this, I think there the main issue that would highly impactful, independently from the raw markdown indexing performance is incremental indexing - since the character input is trivial relative to the file size.

WhiteNoise · October 24, 2024, 6:47pm

I don’t understand this. I already told you that we have diagnosed the problem.

Performance of markdown parser is not the only consideration. Also we cannot throw away one part of the codebase when a new parser comes along.

I think we are going off topic here.
This thread is about obsidian crashing due to going out of memory. We likely found the issue. Let me know if you still experience this problem when you get 1.8.

For everything else, open a different thread.

clehene · October 25, 2024, 8:17pm

I don’t understand this. I already told you that we have diagnosed the problem.

I was under the impression that you tested with the original (small) Start Here.md document and clarifying what I did. You said:

There is no problem with regards to processing speed

It takes 5-10 seconds on my machine - I think that’s a problem

we cannot throw away one part of the codebase when a new parser comes along.

disclaimer: Obsidian is not OSS / so I understand that you may not care about community feedback, but I’m old enough with enough experience in perforce and open source to be fine with “wasting time” explaining the obvious - so here are my two cents

It’s less that a new one comes along (they seem to be both around since 2012) it’s more like one dying - on a quick look, remark is not only slow, but also pretty much dead - last release over a year ago and no activity in the last month (all I was able to look back).
So it’s not just faster and 5x more popular and active. Unfortunately still with a bus factor of 1…

From a commercial standpoint - it looks like you have a performance problem, that is not fully out of control from a critical dependency that may go away. If I put another unrequested hat - I’d say it’s good to know your options.

As for what concerns me personally. I need a tool that helps my productivity - works and preferably fast. I find myself working around issues with Obsidian, trying to split files, debugging and a lot of this time goes into trying to help you make it better and help me, but don’t confuse the seats - I’m here because I don’t have another option and my time is worth more than the money I’m paying for it, as I think it’s the case with all your paying customers, so I’m trying to solve a problem I have as a customer, rather than rambling my opinions online.

On that note, having the developer tools open, solves the crash problem for the most part as it pauses the main indexing process befor it blows off, and lets the indexer finish without causing OOME / or at least giving me an option to commit to git before it crashes. But it’s slow - so I’ll open another ticket for that and hoping the best for fixing this in 1.8 - any ETA?

WhiteNoise · November 8, 2024, 5:29pm

Sorry we do not provide ETAs for releases. The fix for this will be included in v1.7.6.

I suggest you open Feature Request asking for progressive indexing of a note if you haven’t already.

We are running a heavily customized version of remark. It’s possible that at some point we move to a newer version of remark or something else entirely, but there are no plans for it yet.

I agree that the performance of remark (and other parts of Obsidian) is not good on very large/complex files (in the megabytes, the whole bible is 4MB to give a sense of scale).
However, the vast majority of our userbase do not work on such large single files. Therefore, I don’t think we have an urgent problem from the commercial stand point.

clehene · November 10, 2024, 8:59pm

Thank you -
I agree that 2MB+ files are likely uncommon.

I’ll try to get time to write the feature request - I think it’s worth it as the user experience degrades much earlier and that speed matters

system · February 8, 2025, 9:00pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.