Skip to content
Advertisement

PHP using preg_match to match items in array with values that can or not contain accent characters

The preg_match must match any of the words in the $string variable (as long as they are at least 3 chars long) with any of words in the $forbidden array, but here’s the issue:

If the $string contains the word mamíferos (with an accent char) instead of mamiferos, it should also be a match. Same applies if acompañar is in the forbidden array list, but the user decides to type acompanar instead (without the accent char).

$forbidden = array('mamiferos', 'acompañar');

$string = 'los mamíferos corren libres y quieren acompanar a su madre';

if(preg_match('/b(?:'.implode('|', $forbidden).'){3,}/i', $string)) {
    echo 'match!';
} else {
    echo 'nope...';
}

Advertisement

Answer

I suggest a solution based on removing any combining Unicode characters from both the filtered string and the forbidden words. It will require intl extension (sudo apt install php7.4-intl && sudo phpenmod intl). Firstly, it decomposes the Uncode string into characters and combining letter modifiers, secondly, it removes all modifiers (p{M}):

<?php
$string = 'los mamíferos corren libres y quieren acompanar a su madre';

$forbidden = ['mamiferos', 'acompañar'];

function strip (string $accented): string {
    $decomposed = Normalizer::normalize ($accented, Normalizer::FORM_D);
    return preg_replace ('/p{M}/u', '', $decomposed);
}

function filter (string $string, array $words): bool {
    $regex = '/b(?:' . implode ('|', $words) . ')/i';
    return preg_match (strip ($regex), strip ($string));
}
echo ((filter ($string, $forbidden) ? 'match!' : 'nope...') . "n");

By the way, I don’t understand the meaning of {3,} in your regular expression, and I removed it from mine. If you think that it will match a string with three or more forbidden words, you are mistaken: the forbidden words will match only if they immediately follow each other.

Further reading: https://www.php.net/manual/en/class.normalizer.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement