Grammatical and/or syntactic equivalences for backlink discovery

Use case or problem

I often run into trouble when using the automatic backlink detection due to me speaking a language (German) that has quite a large difference between different declensions.
This means that even though I have an entry for “X” this may not be found because it appears in some different grammatical form.
One concrete example where this crops up in English too, is the case of singular/plural forms: If I have an entry for “Dog” I also want to find “dogs” (note that both singular/plural and the capitalization changes).
At the moment, obsidian is very literal when it comes to the “unlinked mentions” discovery.
For example, I have a File named “Transformers”, but it won’t find “transformers” in one of my other files due to capitalization differences.

Proposed solution

There are two options:

  1. you could try to detect matches fuzzily by checking the e.g. Levenshtein distance. You could even give a slider of how high the Levenshtein distance is allowed to be: 0 for the current behavior, 1 for “transformers”, etc…
  2. You could implement proper stemming or lemmatisation

The latter is the more optimal one, though it also is a lot more language dependent. Most natural language processing toolchains should have this.

4 Likes