Ignore accents/diacritics in search

My native language includes diacritics / accents (see https://en.wikipedia.org/wiki/Diacritic). Would it be possible to make it so that search treats characters with diacritics as if they didn’t have them?

For example, the word maximum is written as “máximo”. Sometimes I’m careful and I write correctly. Sometimes in a rush, I might be more careless and write “maximo”. In any case, it would be a reasonable expectation that when I search for either of the terms, both results would appear. This is also common in other applications I use (e.g. recoll).

Thank you for this great software.

8 Likes

+1

This would be very useful indeed.

Steps to reproduce

  1. Create and save note with the text café наёмник (notice the letters with diacritics). Second word is mercenary in Russian, if anyone is interested.
  2. Do a search for cafe.
  3. Do a search for наемник.

Expected result

Search returns the text in the note for steps 2 and 3.

Actual result

Search returns nothing for steps 2 and 3, since doesn’t consider that the letters in question could be interchanged in certain cases. For Russian there’s only one case where letter е could be used instead of ё in certain cases, but I’m not sure what should be done for languages like French with tons of diacritics and occasional French-derived cases in English like café, naïve etc.

Environment

  • Operating system: Windows 10
  • Obsidian version: 0.8.1

Additional information

There’s a feature request concerning this exact behavior but I don’t think it’s a feature, rather it’s a bug.

I think it’s a feature request. In some searches, you may want to ingnore diatrics other times you may want them.

Up to you, of course, but from what I can tell in the above-mentioned cases any English-speaking person would expect to have his “cafe” recognized and found by default. Same applies with a Russian searching for his “наемник”, this should be the default behavior.

They might want to ignore diacritics in certain cases, e.g. to find instances of undiacriticized (eh) words. Then it becomes an additional feature / option.

I agree that this is something important and we should do it. I agree that we may even change the default behaviour to ignore diatrics. I don’t consider it a bug.

1 Like

I incorrectly filed a bug report about Different Unicode code points some weeks ago. Although it’s not Obsidian’s fault, and given the variety of reasons that may lead to these inconsistencies creeping in, I’d prefer it if it were more permissive by matching characters with and without diacritics.

1 Like