Clustering note names

Grasshopper · June 10, 2024, 1:48pm

Hi there,

I am simply wanting to find clusters of notes that are grouped by similar names. I am not interested in the contents of these notes or their links/backlinks/tags etc. I just want to automatically find clusters of notes based on the name of the note. In this way I can minimise note duplication. The results should generate clusters like these:

Cluster: lifestyle
lifestyle issues
exciting lifestyle
lifestyle food
diseases of lifestyle

Cluster: cream
creamy soup
ice cream
creamed

I have searched and searched and haven’t found a method or a plugin to get this done. The plugins ‘Smart Connections’ and ‘Graphview Analysis’ simply don’t do this unfortunately in my understanding.

I do not want to manually type the keyword because I sometimes don’t know the names of the notes I have created many months ago in the first place! If I were to manually type it, then I can use the search function. I want Obsidian to automatically cluster. Any thoughts please? Thanks very much.

anon23099027 · June 10, 2024, 2:07pm

I use Dataview to identify clusters of files based on parts of their file names:

```dataview
TABLE
WHERE contains(file.name, "lifestyle")
LIMIT 50
```

Would that work for you?

Or perhaps an inline query:

```query
file:lifestyle
```

Grasshopper · June 10, 2024, 2:12pm

Thanks, but not really.

With this query, I have to specify the keyword, isn’t it? The thing is I don’t remember all the note names I have created ages ago. The point is to get Obsidian to scan my vault and cluster based on the most commonly used note names.

anon23099027 · June 10, 2024, 2:14pm

Apologies. I misread your original post. Don’t know a way to automate this. Hope someone can help.

Grasshopper · June 10, 2024, 2:15pm

No worries. No need to apologise. Thanks for having a go at this!

Yurcee · June 10, 2024, 8:03pm

don’t know of a plugin, sorry or write the script (i expect this to be not so easy as it looks…)

but for the future, it’s better to prepare for stuff like this and pepper the frontmatter with tags based on file name on file creation
so templater plugin would be your friend and i’ve just seen a similar thread:

personally, i don’t like tags with unnecessary or surplus information but makes life easy when querying files

holroy · June 10, 2024, 11:22pm

I guess one way of dealing with this is to loop through all files and split the lowercased file name into words, and then make a dictionary using a single word as key. In the end each key would then hold all filenames having that word as part of its name.

Next step would be to skip all stopwords, or have a exclusion list of non-interesting word. Or possibly trying to guess and change all words into singular (or plural).

To list the clusters sort the dictionary according to how many filenames for any given word.

Actually this should be doable true a pure Dataview query, I think… what do the following untested query give:

```dataview
TABLE rows.file.link as Files
FLATTEN split(lower(file.name), "\s*") as word
GROUP BY word
SORT length(rows) desc
LIMIT 20
```

This should theoretically give you the 20 clusters with the most filenames related to that word…

Grasshopper · June 11, 2024, 12:33am

Thank you. Newbie to Dataview queries to be honest…where do I copy-paste the above query in please? I have Dataview installed.

holroy · June 11, 2024, 6:09am

It’ll generate a table when you’re in live preview or reading mode instead of the code. So place it in the text where you want to see the result.

Grasshopper · June 11, 2024, 11:58am

Thanks @holroy. Unfortunately it doesn’t work yet.

This is what I get when I run the code:

It generates a list of more than 20 items despite the code saying to generate only 20 items.
The entries are the names of my notes.
Some notes have multiple entries i.e., the exact same note name may be repeated five or six times, which shouldn’t be the case since Obsidian doesn’t allow two notes to have identical names.
I didn’t see entries e.g., ‘sleep’ and ‘sleepy’ and ‘asleep’ immediately next to each other, i.e., clustered together which is the whole purpose of the exercise.

Here’s a screen-shot of my results:

Anything we could do to tinker the code a bit more please? Thank you very much!

Grasshopper · June 11, 2024, 12:33pm

Thanks @Yurcee.

I create a large number of notes on the fly…that is when I write by enclosing the double square brackets. Engaging the templater plugin when my writing is in full flow is quite disruptive.

system · September 9, 2024, 12:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.