Options to control the Quick Switcher algorithm above 10’000 files

Oskiator · March 23, 2023, 6:56pm

Hey

Use case or problem

Actually, when the threshold of 10’000 notes (attached files included) is exceeded, the quick search algorithm used is changed by a simplier one (see @koala answer here)

Example

So for example if I write “cpt”

I would like to obtain the list of all titles containing the letters “c”, “p” and “t” that are in the same order, including all notes about “comportement”.

For example

In fact, I obtain nothing.

Because I’ve reached the threshold. Even if I have “only” 3,913 notes, I attain 10’353 files because of the attachments.

Proposed solution

Short term improvement

It would be cool to have at least an option to enable/disable the simplier algorithm. It seems to be possible keeping it simple for the devs.

When 10’000 notes threshold is reached, the user receive a prompt celebrating the active use of Obsidian. The message inform the user about the limitations of the algoritm when working with a lot of files and when it’s better to use the simplier algoritm.
The user can immediatly choose “Keep it as usual” or “enable the simplier algorithm” and is informed where he can access to this option if he change his mind.

This have 2 benefits

Informing the user to the potential change in the Quick Switcher Behavior. In my case, I had the impression of a bug or that a good function has been removed, because my manner of researching files was relying on the “complex algorithm”. Example, I use that for my courses. For a course name “ PP_AA22-23 - Cours 2.05”, I was taping “PPC05”
This would give more flexibility to the user.

Long term improvement

I really don’t know how this algorithm is coded so I make a suggestion. It could fit with the dev reality or not

I’m wondering if it would be possible to have a fine tuned version of the algorithm.

I imagine that like a continuum. On one side you have the simple algorithm and on the other side you have the “complex one” which is more sensitive and find more results. The user would have the possibility to set the “sensibility” of the algorithm to find a balance between to much result or not enough according to the needs.

I don’t know exactly why you did choose to go to a simplier algorithm after 10’000 notes. If it is a matter of performance, number of results and/or sth else but if could be great to have those advanced options to set the research according to our needs.
I particularly like your complex version of the algorithm🙂

Furthermore, and I don’t know if it would be useful or not, you could add others options to enable simplier algoritm for specific folders, file format, tags, etc. (I don’t know for example if it could be helpful to optimise the results or not so I let it up to you)

Additional information

It may be also important for the user to understand the quick search possibilities from the beginning as it can affect the way of people name their notes. => If you name your notes in order to find them easily with the complex algorithm and then after a certain amount of files, the algorithm change it could be frustrating.
As the algoritm change is quite unexpected and invisible now, it may be possible that some users are already negatively affected and wrongly attribute their difficulties to find their notes to themselves instead of the behaviour of the program. So they may not react about it on the forum. Thus, I think that for this feature, the number of reactions of people shouldn’t be used as a metric to judge the importance of the feature change.

Many thanks for your great work, I’m enjoying Obsidian in my daily student live, knowing that all I put and connect in my second brain will help be to find back easily my knowledge and create better contents and trainings in the future. This will lead me to have a better positive impact in improving performance and well-being in the organisations in which I’ll work.
Your work matter. Your rock. Thank you.

Michaël

Jopp · March 31, 2023, 12:19pm

Wow, 10k items are indeed a lot of files to handle for an application

If developers set boundaries , then for technical reasons.
So if you are over the top, there is nothing you could “fine-tune” anymore, bc over the top.

Right now, I split my data into multiple vaults, bc there is not much crossover in between these knowledge fields. Anyway, since i dont have that much files in a single vault, the best solution i can think of would be to add some folders eg. Attachments, templates etc to an ignore list. Did you look into Prefereces> files & links > excluded files? Does this help you?

Oskiator · April 7, 2023, 12:26pm

Hi Jopp, thanks for your answer. In my case, I really make connections between everything, so I prefer to use one vault.
Unfortunately, excluding files doesn’t change which algorithm is used. It seems that as it is coded now, the choice is made according to the total number of files of the vault, no matter they are excluded or not from the search.

However, your suggestion to filter out certain file types was helpful in making the research clearer. I had not considered excluding *.jpg and *.png files before, so thank you for that.

However thank you for your suggestion to filter out some files because even it doesn’t resolve the point about the research algorithm, at least it allowed me to make the research clearer. I had already filtered out some folders but I didn’t think about filtering *.jpg and *.png files which I just thought now thank to your comment.

Regarding the developers’ decision to set boundaries, I understand that there may be technical limitations involved.
If the choice was made due to the technology’s maximum capacity (e.g. 10,000 files), then I agree with you.
But we don’t know on what base the choice has been made, do we ? Maybe the choice has been made for performance issues. In that case it may be more arbitrary and could depend of the performance of the users’ computer. Thus, as in the graphics settings of video games, we could allow the user to choose his settings according to his material.

I’m not sure as I’m not a dev. I think the Quicksearch is looking in an index. Not sure that 10k index entries are so much, but it would be better to ask to the devs

Enjoy your easter holidays !

lab · October 8, 2023, 8:59pm

I very much want to double this Feature Request and I would like to add two things:

Wiki-Links The issue described by @Oskiator also concerns Wiki-Link-Modals and there, interestingly, an even lower threshold seems to be in place. (compare current discussion in Wiki link autocomplete cannot do fuzzy search in large vault)
Information: Instead of showing “No notes found.” there should be an information explaining (1) the existence and (2) the behavior of the optimization algorithm. If there is no information, users will rely on the empty result and falsely assume that there indeed is no matching note – which is super misleading. Think of it from a UX perspective: If you start out with a small vault, you will get used to rely on QuickSwitcher as a quick and dirty tool to query your vault. And honestly there’s nothing wrong with that, since it is accurate and also much faster than typing file: query in the search field… well… until you hit the threshold…

EDIT: The Information should also contain Live-info about the current vault size (including number of dead-links), see my next post below.

CawlinTeffid · October 8, 2023, 9:20pm

Ideally the app should make clear what’s happening, but at minimum it should be documented. I’ve filed an issue on the documentation: Document simpler autocomplete algorithm for large vaults · Issue #597 · obsidianmd/obsidian-help · GitHub

lab · October 9, 2023, 6:42am

Thanks for filing the issue!

Also I’m wondering now:

Links to existing/non-existing notes: Does the 10.000 notes threshold only count existing notes, or what about all the wiki-links to non-existing notes. I think it is important to make this very clear – ideally alongside with displaying numbers for both measures for the current status of the vault (since most users will not be able to calculate these on their own with JS).

I think, ideally would be shown centralized maybe in the About section of the Settings.

Oskiator · December 8, 2023, 10:16am

It’s great to see all your last comments. I really hope that this topic will be followed up, as the “deeper algoritm” is GREAT and I really see in my worflow that it’s more difficult to find thinks since I’ve overcome the 10’000 limit.

By reading you another idea came up to me.

My previous solution was advantageous for computer with more ressources. I think there is a still more optimised solution working also with lower ressources computers.

Another way to customize the algoritm would be to be able to apply it on a certain part of the vault. E.g :

folders
tags
regex rules
most used notes
notes containing specific words/expressions/links
etc.

It seems to me that this would allow to have a better search without making any compromises with performance for computer with more limited ressources.
It’s then up to the developpers to state if that’s true or not

Enjoy your vaults !