Skip to content
Advertisement

PHP 7.4 PCRE2 newline character warning

Been testing a codebase migration from 7.3 to 7.4, and the only thing that affected us is the PCRE update under PHP.

Currently this regex:
/(>)([Rsvh]*)((&|#)?nbsp;|(&|#)nbsp)*([Rsvh]*)(<)/
Throws a nasty warning:
Compilation failed: escape sequence is invalid in character class at offset 7

And indeed, if I remove the R, the warning disappears, but of course the behaviour changes.

I have read the PCRE2 syntax manual, and they list R as a valid newline sequence character type (see it here). What’s up with it then? Why does it throw a warning for it?

Advertisement

Answer

Inside a character class, you can only define single char matching patterns. That means, m will denote m inside [m], n will denote an LF symbol in [n], $ will match a $ if it is in [$].

The R is not a single char matching pattern, it is roughly a u{000D}u{000A}|[u{000A}u{000B}u{000C}u{000D}u{0085}u{2028}u{2029}] pattern that can match two chars, CR+LF.

Hence, you cannot use it in the character class. Use the chars separately, u{000A}u{000B}u{000C}u{000D}u{0085}u{2028}u{2029} or, depending on the string literal type you use, x{000A}x{000B}x{000C}x{000D}x{0085}x{2028}x{2029}.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement