extend-arabic-query
Check the package on npm. This npm package provides a function to convert any text written in Arabic letters to a Regular Expression? string that can be used as test
or match
method paramter against
any other string to compare for equality.
Normally, comparing words that contain letters that can have multiple forms like the
letter "أ" can give false results. For example, comparing the two forms of writing the
name "Ahmad" ("أحمد" and "احمد") against each other, should be true
, but it
doesn't, even if you tried the strict or abstract equality signs [ "أحمد" == "احمد"
], since the two forms of the letter "alef" (with and
without "hamzah") are different characters since they have
different unicode codes (namely, 623
for "أ", and 627
for "ا").
Also, all the other forms of the letter alef should map to any of the other forms, since they usually get mistakenly swapped for eachother. Say the same for many other letters.
The current version of the package maps the following letters to their corresponding forms or alternatives:
Letter Group | Group Name | Substitution | |||||||
---|---|---|---|---|---|---|---|---|---|
ة - ه | haa group | same* | |||||||
ا - أ - إ - آ - ء | alef group | same, plus, all other forms of hamza (ئ - ؤ) | |||||||
ئ | hamza on yaa group | same as for alef, plus all other forms of yaa | |||||||
ؤ | hamza on waw group | same as for alef, plus all other forms of yaa | |||||||
و | waw group | و - ؤ | |||||||
ي - ى | yaa group | same, plus "ئ" | |||||||
* "same" means a RegEx list of all the characters, i.e. if any of either "ه" or "ة" was found in the string,
it will be replaced with [ةه] in the RegEx string. |
The library also takes account for possible mis-spellings due to local pronounciation. Currently, the following groups are considered:
Letter Group | Group Name | Substitution |
---|---|---|
ز - ذ | zai group | same, since many Arabic dialects use both interchangeably |
ث - س | seen group | same, since many Arabic dialects use both interchangeably |
Also, since many would write the words "أبو" and "عبد" with a trailing space or without it. It's been taken into consideration as well!
Also, not to forget irregularly spelled names, like "داوود" and "يحيى" as some may write them with different amount of vowels. This, too, has been taken into consideration!
Try IT!
Write whatever you want in this input, and it will immediately show you the RegEx string output beneath it.
compare("عبد الجيد أحمد حماده ابو ذكري", "عبدالجيد احمد حمادة أبوذكرى")
This expression evaluates to true The Underlying RegEx String:
(?:عبدال|عبد ال)ج[يئى]د [اأإآءئؤ]حمد حم[اأإآءئؤ]د[ةه] (?:[اأإآ]ب[وؤ][ء-ي]|[اأإآ]ب[وؤ] [ء-ي])كر[يئى]
A More Clear Syntax
const text = "عبد الجيد أحمد حماده ابو ذكري";
const text_to_compare = "عبدالجيد احمد حمادة أبوذكرى";
const result = compare(text, text_to_compare);
Now you should be able to use this function in any check including any Arabic strings.
You can even use it inside the pattern
attribute for input
s to generate highly specific, and smart patterns.
The sky is the limit!