Skip to content
Advertisement

PHP using preg_match to match items in array with values that can or not contain accent characters

The preg_match must match any of the words in the $string variable (as long as they are at least 3 chars long) with any of words in the $forbidden array, but here’s the issue:

If the $string contains the word mamíferos (with an accent char) instead of mamiferos, it should also be a match. Same applies if acompañar is in the forbidden array list, but the user decides to type acompanar instead (without the accent char).

JavaScript

Advertisement

Answer

I suggest a solution based on removing any combining Unicode characters from both the filtered string and the forbidden words. It will require intl extension (sudo apt install php7.4-intl && sudo phpenmod intl). Firstly, it decomposes the Uncode string into characters and combining letter modifiers, secondly, it removes all modifiers (p{M}):

JavaScript

By the way, I don’t understand the meaning of {3,} in your regular expression, and I removed it from mine. If you think that it will match a string with three or more forbidden words, you are mistaken: the forbidden words will match only if they immediately follow each other.

Further reading: https://www.php.net/manual/en/class.normalizer.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement