Skip to content
Advertisement

PHP RegEx preg_match_all Reiterate Matched Words Before the Current Match

I have the following RegEx code

JavaScript

I expect the matches to include:

word1 word2 word3 keyword word4 word5 word6

word4 word5 word6 keyword word7 word8 word9

But in reality, I’m getting these:

word1 word2 word3 keyword word4 word5 word6

keyword word7 word8 word9

In other words, the second match is cropped because of the 1st match.

Here’s a test: https://regex101.com/r/EPp14b/1/

Advertisement

Answer

If you don’t want to cross the word keyword, you might use a negative lookahead when repeating 1-3 words to assert that they are not the keyword.

After the match, you can use a positive lookahead assertion with a capture group, matching 1-3 words which are again not the keyword

The sentence will be a concatenation of the full match and group 1.

JavaScript

The pattern matches:

  • (?<!S) Assert a whitspace boundary to the left
  • (?: Non capture group
    • (?!keywordb)w+h+ Negative lookahead, match a word and whitespaces if it is not keyword
  • ){1,3} Close non capture group and repeat 1-3 times
  • keywordb Match keyword
  • (?= Positive lookahead
    • ( Capture group 1
      • (?:h+(?!keywordb)w+){1,3} Match 1-3 words that do not start with keyword
    • ) Close group 1
  • ) Close lookahead

Regex demo | Php demo

JavaScript

Output

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement