I am simply wanting to find clusters of notes that are grouped by similar names. I am not interested in the contents of these notes or their links/backlinks/tags etc. I just want to automatically find clusters of notes based on the name of the note. In this way I can minimise note duplication. The results should generate clusters like these:
I have searched and searched and havenât found a method or a plugin to get this done. The plugins âSmart Connectionsâ and âGraphview Analysisâ simply donât do this unfortunately in my understanding.
I do not want to manually type the keyword because I sometimes donât know the names of the notes I have created many months ago in the first place! If I were to manually type it, then I can use the search function. I want Obsidian to automatically cluster. Any thoughts please? Thanks very much.
With this query, I have to specify the keyword, isnât it? The thing is I donât remember all the note names I have created ages ago. The point is to get Obsidian to scan my vault and cluster based on the most commonly used note names.
donât know of a plugin, sorry or write the script (i expect this to be not so easy as it looksâŚ)
but for the future, itâs better to prepare for stuff like this and pepper the frontmatter with tags based on file name on file creation
so templater plugin would be your friend and iâve just seen a similar thread:
personally, i donât like tags with unnecessary or surplus information but makes life easy when querying files
I guess one way of dealing with this is to loop through all files and split the lowercased file name into words, and then make a dictionary using a single word as key. In the end each key would then hold all filenames having that word as part of its name.
Next step would be to skip all stopwords, or have a exclusion list of non-interesting word. Or possibly trying to guess and change all words into singular (or plural).
To list the clusters sort the dictionary according to how many filenames for any given word.
Actually this should be doable true a pure Dataview query, I think⌠what do the following untested query give:
```dataview
TABLE rows.file.link as Files
FLATTEN split(lower(file.name), "\s*") as word
GROUP BY word
SORT length(rows) desc
LIMIT 20
```
This should theoretically give you the 20 clusters with the most filenames related to that wordâŚ
Thanks @holroy. Unfortunately it doesnât work yet.
This is what I get when I run the code:
It generates a list of more than 20 items despite the code saying to generate only 20 items.
The entries are the names of my notes.
Some notes have multiple entries i.e., the exact same note name may be repeated five or six times, which shouldnât be the case since Obsidian doesnât allow two notes to have identical names.
I didnât see entries e.g., âsleepâ and âsleepyâ and âasleepâ immediately next to each other, i.e., clustered together which is the whole purpose of the exercise.
Hereâs a screen-shot of my results:
Anything we could do to tinker the code a bit more please? Thank you very much!
I create a large number of notes on the flyâŚthat is when I write by enclosing the double square brackets. Engaging the templater plugin when my writing is in full flow is quite disruptive.