I have pregmatch_all function that scrapes emails ! But the output is kind of weirdly structured and I don’t manage to restructure it.
$str = "Service Client ouvert : du lundi au vendredi de 9h à 20h le samedi de 9h à 18h 01 75 85 83 83 hello@soshape.com So Shape France / Site créé par KL Consult" $pattern = '/[a-z0-9_-+.]+@[a-z0-9-]+.([a-z]{2,4})(?:.[a-z]{2})?/i'; $preg_match_all($pattern, $str, $output); print_r($output);
returns
Array ( [0] => Array ( [0] => hello@soshape.com ) [1] => Array ( [0] => com ) )
Two things please:
- How can I get rid of the “com” array ?
- How can I restructure the array in order to get : Array ( [0] => hello@soshape.com )
Any idea ? (if my question is not clear, do not hesitate to let me know. I’ll be glad to explain further !)
Advertisement
Answer
As Markus wrote in the comment lets look at the PREG_PATTERN_ORDER
flag:
Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.
From the documentation here
Because your patten include ()
as in ([a-z]{2,4})
the .com
is catched as well.
If you only want the emails you can use the PREG_PATTERN_ORDER
which will return the matches in index 0 of the return array (as you don’t want the parenthesized subpattern you can ignore the following indexes)
So you can do:
$preg_match_all($pattern, $str, $output, PREG_PATTERN_ORDER); print_r($output[0]);