Skip to content
Advertisement

How would I match all “quote blocks” in plaintext e-mail in PHP PCRE?

I’m trying to match all the quotes in the following example e-mail message:

JavaScript

That means I want to match these three strings:

JavaScript

And:

JavaScript

And:

JavaScript

I don’t understand how I can do this, since if I use the s flag to span multiple lines, which is required for this, I cannot refer to ^ and $ to mean “beginning of line” and “end of line” — instead, they mean “beginning of string” and “end of string”.

So if I do: #^(> .+?)$#us, it will match everything after/with the first quote.

And if I do: #^(> .+?)$#um, it will match only the first quote’s first line and nothing else.

This is frustrating. I really have no idea how to solve it. I’ve searched online before asking and found zero even remotely relevant pages as usual.

Advertisement

Answer

With preg_match_all:

JavaScript

(where R is an alias for several newline sequences)


With preg_split:

JavaScript

that splits the string on each line that doesn’t start with > . To trim the newline at the end of each block, you can start this pattern with an optional R? => ~R?^(?!> ).*R?~m or like that ~(?:R?^(?!> ).*)+R?~m to eventually grab several lines at a time.


About R:
R is by default an alias for (?>rn|n|x0b|f|r|x85) (any non-utf8 8bits characters sequences for a newline). In utf8 mode, with the u modifier or starting the pattern with (*UTF8)(*BSR_UNICODE), two other characters oustide of the ASCII range are added to the list: the line separator (U+2028), the paragraph separator (U+2029).
It’s handy when you don’t know which newline sequence is used in the string but slower than writing the exact newline sequence if you know it. You can restrict R to (?>rn|n|r) with the directive (*BSR_ANYCRLF) at the start of the pattern.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement