Skip to content
Advertisement

Weird array structure in preg_match_all output PHP

I have pregmatch_all function that scrapes emails ! But the output is kind of weirdly structured and I don’t manage to restructure it.

$str = "Service Client ouvert : du lundi au vendredi de 9h à 20h le samedi de 9h à 18h 01 75 85 83 83 hello@soshape.com So Shape France / Site créé par KL Consult"
$pattern = '/[a-z0-9_-+.]+@[a-z0-9-]+.([a-z]{2,4})(?:.[a-z]{2})?/i';
$preg_match_all($pattern, $str, $output);
print_r($output);

returns

Array ( [0] => Array ( [0] => hello@soshape.com ) [1] => Array ( [0] => com ) )

Two things please:

  • How can I get rid of the “com” array ?
  • How can I restructure the array in order to get : Array ( [0] => hello@soshape.com )

Any idea ? (if my question is not clear, do not hesitate to let me know. I’ll be glad to explain further !)

Advertisement

Answer

As Markus wrote in the comment lets look at the PREG_PATTERN_ORDER flag:

Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

From the documentation here

Because your patten include () as in ([a-z]{2,4}) the .com is catched as well.

If you only want the emails you can use the PREG_PATTERN_ORDER which will return the matches in index 0 of the return array (as you don’t want the parenthesized subpattern you can ignore the following indexes)

So you can do:

$preg_match_all($pattern, $str, $output, PREG_PATTERN_ORDER);
print_r($output[0]);
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement