Skip to content
Advertisement

PHP RegEx preg_match_all Reiterate Matched Words Before the Current Match

I have the following RegEx code

$str = 'word1 word2 word3 keyword word4 word5 word6 keyword word7 word8 word9 word10';
$matches = array();
preg_match_all('/(w* ){1,3}keyword( w*){1,3}/u', $str, $matches);

I expect the matches to include:

word1 word2 word3 keyword word4 word5 word6

word4 word5 word6 keyword word7 word8 word9

But in reality, I’m getting these:

word1 word2 word3 keyword word4 word5 word6

keyword word7 word8 word9

In other words, the second match is cropped because of the 1st match.

Here’s a test: https://regex101.com/r/EPp14b/1/

Advertisement

Answer

If you don’t want to cross the word keyword, you might use a negative lookahead when repeating 1-3 words to assert that they are not the keyword.

After the match, you can use a positive lookahead assertion with a capture group, matching 1-3 words which are again not the keyword

The sentence will be a concatenation of the full match and group 1.

(?<!S)(?:(?!keywordb)w+h+){1,3}keywordb(?=((?:h+(?!keywordb)w+){1,3}))

The pattern matches:

  • (?<!S) Assert a whitspace boundary to the left
  • (?: Non capture group
    • (?!keywordb)w+h+ Negative lookahead, match a word and whitespaces if it is not keyword
  • ){1,3} Close non capture group and repeat 1-3 times
  • keywordb Match keyword
  • (?= Positive lookahead
    • ( Capture group 1
      • (?:h+(?!keywordb)w+){1,3} Match 1-3 words that do not start with keyword
    • ) Close group 1
  • ) Close lookahead

Regex demo | Php demo

$re = '/(?<!S)((?:(?!keywordb)w+h+){1,3}keywordb)(?=((?:h+(?!keywordb)w+){1,3}))/u';

$strings = [
    "word1 word2 word3 keyword word4 word5 word6 keyword word7 word8 word9 word10",
    "word2 keyword word4 word5 word6 keyword word7 word8",
    "word2 word3 keyword word4 word5 word6 keyword word7 keyword word10",
];

foreach ($strings as $str) {
    preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
    $matches = array_map(function($m) {
        return $m[1] . $m[2];
    }, $matches);
    print_r($matches);
}

Output

Array
(
    [0] => word1 word2 word3 keyword word4 word5 word6
    [1] => word4 word5 word6 keyword word7 word8 word9
)
Array
(
    [0] => word2 keyword word4 word5 word6
    [1] => word4 word5 word6 keyword word7 word8
)
Array
(
    [0] => word2 word3 keyword word4 word5 word6
    [1] => word4 word5 word6 keyword word7
    [2] => word7 keyword word10
)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement