Skip to content
Advertisement

PHP RegExp for nested Div tags

I need a regexp I can use with PHP’s preg_match_all() to match out content inside div-tags. The divs look like this:

JavaScript

I’ve come up with this regexp so far which matches out all divs with id=”t[number]”

JavaScript

The problem is when the content consists of more divs, nested divs like this:

JavaScript

Any ideas on how I make my regexp work with nested tags?

Thanks

Advertisement

Answer

Try a parser instead:

JavaScript

Output:

JavaScript

Download the parser here: http://simplehtmldom.sourceforge.net/

Edit: More for my own amusement I tried to do it in regex. Here’s what I came up with:

JavaScript

Output:

JavaScript

And a small explanation:

JavaScript

Now perhaps you understand why people try to persuade you from not using regex for this. As already noted, it will not help if the the html is improperly formed: the regex will make a bigger mess of the output than an html parser, I assure you. Also, the regex will probably make your eyes bleed and your colleagues (or the people who will maintain your software) may come looking for you after seeing what you did. 🙂

Your best bet is to first clean up your input (using TIDY or similar), and then use a parser to get the info you want.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement