Exempting Diacritics/Tashkeel/Harakat (Arabic) in Search

Hello…

There’s a problem that Arabic users face when searching in database; which is the use of (harakat/tashkeel); as Arabs are used to adding phonetics glyphs to words that usually don’t change the meaning of a word. A good example is this (regarding the word اسمى); it may be written in many different ways:

1- with diacritics: “أَسْمَى”
2- with changing the last letter of “ى” into “ي” so it would be like “أسمي”
3- with kashida or “ـ” in some part of the word so it would possibly be “أسمــى”
4-with variant forms of alef hamza (أ - إ - ا - ء), so it might be “اسمى” or “إسمى”
5- any combination of the former cases, i.e diacritics and kashida.

The problem right now; is that good case scenario is to be able to search this word however it is varied with these changes. This is not the current case. If you add a diacritic, it won’t show the same word without it.

  • Here’s the unicode directions for the diacritics and variant alef hamza: http://unicode.org/L2/L2011/11069-arabic-harakat.pdf

  • Regarding the letters ي and ى; there code is:
    0649 ى ARABIC LETTER ALEF MAKSURA
    064A ي ARABIC LETTER YEH

  • Regarding the kashida, the unicode is: 0640 ـ ARABIC TATWEEL

(found in https://www.unicode.org/charts/PDF/U0600.pdf )

Can someone help with this?

3 Likes

No one can help with this?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.