I use this code to select 2 words around either side of a word.
((w+W+){0,2}WORDHERE(W+w+){0,2})
But it treats apostrophe-separated words as “two words”.
For example, with the input text:
you’re not WORDHERE is the best
the worst WORDHERE surely didn’t win
The result is:
you’re not WORDHERE is the best
the worst WORDHERE surely didn‘t win
How can I make this code understand that words with an apostrophe should be treated as a single word?
Advertisement
Answer
In the pattern that you use [^srn]+
matches any char except a whitespace or newline an could possibly also match ''''
If you want to match apostrophe-separated words where the apostrophe is not at the start or at the end, you might use:
(?:w+(?:'w+)? ){0,2}WORDHERE(?: w+(?:'w+)?){0,2}
Explanation
(?:
Non capture groupw+(?:'w+)?
Match 1+ word chars, optionally match a'
and 1+ word chars followed by a space
){0,2}
Close group and repeat 0-2 timesWORDHERE
Match literally(?:
Non capture groupw+(?:'w+)?
Same as the previous pattern, only the space is now at the beginning
){0,2}
Close group and repeat 0-2 times