Ignore accents/diacritics in search

Not sure about that. Apparently 20% of the world speaks English. And most of the world doesn’t use the roman alphabet, which is where diacritics are modifiers.

First of all, I think yours is a very unnecessary reply. It adds nothing to the main discussion, and can be interpreted as, again, contempt against “diacritics users”…

Having said that, it seems to be also wrong, according to the Encyclopedia Britannica:

The Latin alphabet is the most widely used script, with nearly 70 percent of the world’s population employing it.

Of course, how many of those actually use diacritics I don’t know… I would say, the whole of Latin America, most of western continental Europe (and many eastern European countries too, such as Romania, Poland), some Asian languages (at least Filipino, from the top of my mind.)

Apparently 20% of the world speaks English

Yeah, about that: this is the full paragraph where I think this information you mentioned was quoted from:

Approximately 20% of the world’s population speaks English, with around 400 million people speaking it natively and an additional 1.5 billion who speak it as their second or foreign language.

So 20% (i.e. 1.9 billion people), for this discussion, is a very inflated value, since most of this people do not speak English natively - meaning it’s not their primary nor their only language.

TL;DR I do not know how many people use diacritics in the world, but I can say it’s a lot. Probably more than native English speakers, and probably, yes, most of the world. Is it enough to consider this an important feature? We’ll see.

1 Like

I refer you to your comment:

which seems entirely unnecessary

8% world population

Europe total only 9%

“Most Filipinos (and Philippine news journals) write Tagalog without using any diacritic at all . However, pieces of Tagalog writing which use diacritics can occasionally be found in some religious journals, old books, and others. This case is where a word is stressed in the last or end syllable.”

I don’t think I’m wrong.
Over 50% of the world’s population is in Asia. Most of those using the Latin alphabet will do so in English.

idk. The numbers that article relies on are wrong, and underestimates the number of native English speakers (there’s nearly their 400m in North America alone). It doesn’t matter. The point is that claiming a spurious superiority by being “international” or “most of the world” is not the way to support a case. Especially when diacritic users are not most of the world.

I have no contempt for diacritic users: I sometimes use them myself. Just as I don’t always use the Latin alphabet.

@Dor please back down. It doesn’t matter if you’re right or wrong, there is no reason to make any of these arguments. Any further arguing in this thread will simply be flagged and removed. The previous replies may still be.

@rsenna, you are (a bit) newer here, so I’ll guide you to the Code of Conduct, “Encouraged Behaviors”, which includes “step away when heated”. You are free to flag posts you think are off-topic or inappropriate. Community code of conduct - Obsidian Help

The feature request is tagged “valuable” and “i18n”. No one needs to defend whether this is a valuable feature request. It is.

2 Likes

Hi, being able to search regardless of special characters is critical for Hebrew also (all other searches i used ignore the vowel characters). So + 1 for this option

1 Like

Well, I didn’t know about this issue until now. I don’t know how many searches missed some of the results because of accents, but +1

1 Like

I want to add that cyrillic script also has diacritics and suffers from this issue as well. For example cyrillic letter “ё” often is written as “е” (they are not the same letters as latin, at least for computers). There can be other examples that I don’t know about, because there are many languages that use different versions of cyrillic.

1 Like

This is a followup to an old post that was published in Help and left unanswered. More details can be found in the original post Exempting Diacritics/Tashkeel/Harakat (Arabic) in Search

Use case or problem

I want arabic words to be found through search even if they have different “tashkeel” [1]. An example is: if I search for “أَسْمَى” I would find instances with “أسمى”. You can also think of it as searching for “éléphant” and finding instances with “elephant”.

[1] The same word can be written with Tashkeel or not and would mean exactly the same thing, it serves the pronunciation as the same letter can be pronounced in different forms. Two words with different tashkeels can also have two different meanings, however, they are usually related.

Proposed solution

Keywords and the documents being indexed should be stored in a canonical form. So we basically need an arabic language canonilizer that would be plugged into the search engine.

Current workaround (optional)

Nothing really apart that I usually try to write keywords without tashkeel so that I can find documents later. This is quite hard to manage, so I wouldn’t consider it as an actual workaround.

To note that I’m willing to help implement this feature as a plugin (if possible) if it’s actually feasible (as it’s related to documents’ search). So any guidance would be helpful as well.

2 Likes