Skip to content
Advertisement

This regex works on regex101 but not in my script, why?

I have this regex:

 '/gr[aàâ]ce[s]{1,5}au?[s]{1,5}docteur[sS]{1,100}sans[sS]{1,100}(?:faire de|recevoir de) vaccin/igm';

Which is not working as it should, see this code:

$text = "Grâce a docteur markus j ai reçu mon pass vaccinal  sans toute fois recevoir de vaccin";
$regex = '/gr[aâà]ce[s]{1,5}au?[s]{1,5}docteur[sS]{1,100}sans[sS]{1,100}(?:faire de|recevoir de) vaccin/im';
if(preg_match_all($regex, $text)){
    echo "catch";
}

I wrote my regex using this and it is working fine, until i add the regex to my script. https://regex101.com/r/EmebOT/1

If i replace [aàâ] with “â” in the regex so the regex looks like this, it works:

'/grâce[s]{1,5}au?[s]{1,5}docteur[sS]{1,100}sans[sS]{1,100}(?:faire de|recevoir de) vaccin/igm'

Advertisement

Answer

I got this working with use of the u modifier. Honestly have not parsed a lot of UTF-8 characters and didn’t know about this.

I have this code working in PHP 7 and 8:

<?php
$text = "Grâce a docteur markus j ai reçu mon pass vaccinal  sans toute fois recevoir de vaccin il sont authentiques avec un "QR Code" contenant les informations essentielles, ainsi qu'une signature numérique pour assurer l'authenticité du certificat Covid je sui nouveau dans se groupe alors pour ceux qui sont intéressé me laisse un message en privé je vous explique ou alors voici le lien télégramme du docteur markus pas d insulte s il vous plait tes pas intéressé tu ignores ensemble disons non au vaccin forcé : https://t.me/docteur_markus";

$regex1 = "/gr[aàâ]ce[s]{1,5}au?[s]{1,5}docteur[sS]{1,100}sans[sS]{1,100}(?:faire de|recevoir de) vaccin/iu";

$result = preg_match_all($regex1, $text, $m);
            
var_dump([$result, $m]);

u (PCRE_UTF8) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid.

https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement