extend-arabic-query
Check the package on npm. This npm package provides a function to convert any text written in Arabic
letters to a Regular Expression ?
string that can be used as test
or match
method paramter
against any other string to compare for equality.
Normally, comparing words that contain letters that can have multiple forms
like the letter "أ" can give false results. For example, comparing the two
forms of writing the name "Ahmad" ("أحمد" and "احمد") against each other,
should be true
, but it doesn't, even if you tried the strict or
abstract equality signs [
"أحمد" == "احمد"
], since the two forms of the letter "alef" (with
and without "hamzah") are different characters
since they have different unicode codes (namely,
623
for "أ", and 627
for
"ا").
Also, all the other forms of the letter alef should map to any of the other forms, since they usually get mistakenly swapped for eachother. Say the same for many other letters.
The current version of the package maps the following letters to their corresponding forms or alternatives:
Letter Group | Group Name | Substitution | |||||||
---|---|---|---|---|---|---|---|---|---|
ة - ه | haa group | same* | |||||||
ا - أ - إ - آ - ء | alef group | same, plus, all other forms of hamza (ئ - ؤ) | |||||||
ئ | hamza on yaa group | same as for alef, plus all other forms of yaa | |||||||
ؤ | hamza on waw group | same as for alef, plus all other forms of yaa | |||||||
و | waw group | و - ؤ | |||||||
ي - ى | yaa group | same, plus "ئ" | |||||||
* "same" means a
RegEx
list of all the characters, i.e. if any of either "ه" or "ة" was found
in the string, it will be replaced with
[ةه] in the RegEx string. |
The library also takes account for possible mis-spellings due to local pronounciation. Currently, the following groups are considered:
Letter Group | Group Name | Substitution |
---|---|---|
ز - ذ | zai group | same, since many Arabic dialects use both interchangeably |
ث - س | seen group | same, since many Arabic dialects use both interchangeably |
Also, since many would write the words "أبو" and "عبد" with a trailing space or without it. It's been taken into consideration as well!
Also, not to forget irregularly spelled names, like "داوود" and "يحيى" as some may write them with different amount of vowels. This, too, has been taken into consideration!
Try IT!
Write whatever you want in this input, and it will immediately show you the RegEx string output beneath it.
new RegExp(extendQuery("عبد الجيد أحمد حماده ابو ذكري"), "g").test("عبدالجيد احمد حمادة أبوذكرى")
This expression evaluates to
true The Underlying RegEx String:
(?:عبدال|عبد ال)ج[يئى]د [اأإآءئؤ]حمد حم[اأإآءئؤ]د[ةه] (?:[اأإآ]ب[وؤ][ء-ي]|[اأإآ]ب[وؤ] [ء-ي])كر[يئى]
Clearer Syntax
const text = "عبد الجيد أحمد حماده ابو ذكري";
const text_to_compare = "عبدالجيد احمد حمادة أبوذكرى";
const regex = new RegExp(extendQuery(text), 'g');
const result = regex.test(text_to_compare)
Now you should be able to use this function in any check including any Arabic
strings. You can even use it inside the pattern
attribute for
input
s to generate highly specific, and smart patterns.
The sky is the limit!