I have the following RegEx code
$str = 'word1 word2 word3 keyword word4 word5 word6 keyword word7 word8 word9 word10'; $matches = array(); preg_match_all('/(w* ){1,3}keyword( w*){1,3}/u', $str, $matches);
I expect the matches to include:
word1 word2 word3 keyword word4 word5 word6
word4 word5 word6 keyword word7 word8 word9
But in reality, I’m getting these:
word1 word2 word3 keyword word4 word5 word6
keyword word7 word8 word9
In other words, the second match is cropped because of the 1st match.
Here’s a test: https://regex101.com/r/EPp14b/1/
Advertisement
Answer
If you don’t want to cross the word keyword
, you might use a negative lookahead when repeating 1-3 words to assert that they are not the keyword.
After the match, you can use a positive lookahead assertion with a capture group, matching 1-3 words which are again not the keyword
The sentence will be a concatenation of the full match and group 1.
(?<!S)(?:(?!keywordb)w+h+){1,3}keywordb(?=((?:h+(?!keywordb)w+){1,3}))
The pattern matches:
(?<!S)
Assert a whitspace boundary to the left(?:
Non capture group(?!keywordb)w+h+
Negative lookahead, match a word and whitespaces if it is notkeyword
){1,3}
Close non capture group and repeat 1-3 timeskeywordb
Matchkeyword
(?=
Positive lookahead(
Capture group 1(?:h+(?!keywordb)w+){1,3}
Match 1-3 words that do not start withkeyword
)
Close group 1
)
Close lookahead
$re = '/(?<!S)((?:(?!keywordb)w+h+){1,3}keywordb)(?=((?:h+(?!keywordb)w+){1,3}))/u'; $strings = [ "word1 word2 word3 keyword word4 word5 word6 keyword word7 word8 word9 word10", "word2 keyword word4 word5 word6 keyword word7 word8", "word2 word3 keyword word4 word5 word6 keyword word7 keyword word10", ]; foreach ($strings as $str) { preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0); $matches = array_map(function($m) { return $m[1] . $m[2]; }, $matches); print_r($matches); }
Output
Array ( [0] => word1 word2 word3 keyword word4 word5 word6 [1] => word4 word5 word6 keyword word7 word8 word9 ) Array ( [0] => word2 keyword word4 word5 word6 [1] => word4 word5 word6 keyword word7 word8 ) Array ( [0] => word2 word3 keyword word4 word5 word6 [1] => word4 word5 word6 keyword word7 [2] => word7 keyword word10 )