PHP using preg_match to match items in array with values that can or not contain accent characters

Question

The preg_match must match any of the words in the $string variable (as long as they are at least 3 chars long) with any of words in the $forbidden array, but here's the issue: If the $string contains ...

Accepted Answer

I suggest a solution based on removing any combining Unicode characters from both the filtered string and the forbidden words. It will require intl extension (sudo apt install php7.4-intl && sudo phpenmod intl). Firstly, it decomposes the Uncode string into characters and combining letter modifiers, secondly, it removes all modifiers (p{M}):<?php$string = 'los mamíferos corren libres y quieren acompanar a su madre';$forbidden = ['mamiferos', 'acompañar'];function strip (string $accented): string {    $decomposed = Normalizer::normalize ($accented, Normalizer::FORM_D);    return preg_replace ('/p{M}/u', '', $decomposed);}function filter (string $string, array $words): bool {    $regex = '/b(?:' . implode ('|', $words) . ')/i';    return preg_match (strip ($regex), strip ($string));}echo ((filter ($string, $forbidden) ? 'match!' : 'nope...') . "n");By the way, I don&#8217;t understand the meaning of {3,} in your regular expression, and I removed it from mine. If you think that it will match a string with three or more forbidden words, you are mistaken: the forbidden words will match only if they immediately follow each other.Further reading: https://www.php.net/manual/en/class.normalizer.

Advertisement

Answer