Been testing a codebase migration from 7.3 to 7.4, and the only thing that affected us is the PCRE update under PHP.
Currently this regex:
/(>)([Rsvh]*)((&|#)?nbsp;|(&|#)nbsp)*([Rsvh]*)(<)/
Throws a nasty warning:
Compilation failed: escape sequence is invalid in character class at offset 7
And indeed, if I remove the R, the warning disappears, but of course the behaviour changes.
I have read the PCRE2 syntax manual, and they list R as a valid newline sequence character type (see it here). What’s up with it then? Why does it throw a warning for it?
Advertisement
Answer
Inside a character class, you can only define single char matching patterns. That means, m
will denote m
inside [m]
, n
will denote an LF symbol in [n]
, $
will match a $
if it is in [$]
.
The R
is not a single char matching pattern, it is roughly a u{000D}u{000A}|[u{000A}u{000B}u{000C}u{000D}u{0085}u{2028}u{2029}]
pattern that can match two chars, CR+LF.
Hence, you cannot use it in the character class. Use the chars separately, u{000A}u{000B}u{000C}u{000D}u{0085}u{2028}u{2029}
or, depending on the string literal type you use, x{000A}x{000B}x{000C}x{000D}x{0085}x{2028}x{2029}
.