extend-arabic-query

Check the package on npm.

This npm package provides a function to convert any text written in Arabic letters to a Regular Expression ? string that can be used as test or match method paramter against any other string to compare for equality.

Normally, comparing words that contain letters that can have multiple forms like the letter "أ" can give false results. For example, comparing the two forms of writing the name "Ahmad" ("أحمد" and "احمد") against each other, should be true, but it doesn't, even if you tried the strict or abstract equality signs [ "أحمد" == "احمد" ], since the two forms of the letter "alef" (with and without "hamzah") are different characters since they have different unicode codes (namely, 623 for "أ", and 627 for "ا").

Also, all the other forms of the letter alef should map to any of the other forms, since they usually get mistakenly swapped for eachother. Say the same for many other letters.

The current version of the package maps the following letters to their corresponding forms or alternatives:

Letter Group Group Name Substitution
ة - ه haa group same*
ا - أ - إ - آ - ء alef group same, plus, all other forms of hamza (ئ - ؤ)
ئ hamza on yaa group same as for alef, plus all other forms of yaa
ؤ hamza on waw group same as for alef, plus all other forms of yaa
و waw group و - ؤ
ي - ى yaa group same, plus "ئ"
* "same" means a RegEx list of all the characters, i.e. if any of either "ه" or "ة" was found in the string, it will be replaced with [ةه] in the RegEx string.

The library also takes account for possible mis-spellings due to local pronounciation. Currently, the following groups are considered:

Letter Group Group Name Substitution
ز - ذ zai group same, since many Arabic dialects use both interchangeably
ث - س seen group same, since many Arabic dialects use both interchangeably

Also, since many would write the words "أبو" and "عبد" with a trailing space or without it. It's been taken into consideration as well!

Also, not to forget irregularly spelled names, like "داوود" and "يحيى" as some may write them with different amount of vowels. This, too, has been taken into consideration!

Try IT!

Write whatever you want in this input, and it will immediately show you the RegEx string output beneath it.

new RegExp(extendQuery("عبد الجيد أحمد حماده ابو ذكري"), "g").test("عبدالجيد احمد حمادة أبوذكرى") This expression evaluates to true
The Underlying RegEx String:

(?:عبدال|عبد ال)ج[يئى]د [اأإآءئؤ]حمد حم[اأإآءئؤ]د[ةه] (?:[اأإآ]ب[وؤ][ء-ي]|[اأإآ]ب[وؤ] [ء-ي])كر[يئى]

Clearer Syntax
const text = "عبد الجيد أحمد حماده ابو ذكري";
const text_to_compare = "عبدالجيد احمد حمادة أبوذكرى";

const regex = new RegExp(extendQuery(text), 'g');

const result = regex.test(text_to_compare)
    
javascript

Now you should be able to use this function in any check including any Arabic strings. You can even use it inside the pattern attribute for inputs to generate highly specific, and smart patterns.

The sky is the limit!