I have searched but cannot find a solution that works. I have tried using DOM but the result is not identical (different spaces and tag elements – minor differences but I need identical for further pattern searches on the source) to the source, hence I would like to try regex. Is this possible (I know it isn’t best solution but would like to try it)? For example is it possible to return all of the div class “want-this-entire-div-class” including inner:
$html = '<div class="not-want"> <div class="also-not-want"> <div class="want-this-entire-div-class"> <button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button> <div class="dropdown-menu j-dropdown"> <div class="header-search"> <input type="text" name="search" value="" placeholder="Search entire site here..." class="search-input" data-category_id=""/> <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? route=product/search&search="></button> </div> </div> </div> <div class="not-want-this-also"> <div class="or-this">';
The following stops after the first div>
preg_match(‘/<div class=”want-this-entire-div-class”(.*?)</div>/s’, $html, $match); Thanks
Advertisement
Answer
One way to tackle this is with a state machine. You enumerate all the possible states, then take action depending on what state you are in. In this case its
- line to ignore
- target open div
- line to add
- extra open div
- extra close div
- target close div
I dont expect this is robust, but it does work for the given example:
<?php function inner_div(string $html_s, string $cont_s): string { $html_a = explode("n", $html_s); $div_b = false; $div_n = 0; foreach ($html_a as $tok_s) { # state 2: target open div if (str_contains($tok_s, 'want-this-entire-div-class')) { $div_b = true; } # state 1: line to ignore if (! $div_b) { continue; } # state 3: line to add $out_a[] = $tok_s; # state 4: extra open div if (str_contains($tok_s, '<div')) { $div_n++; } # state 5: extra close div if (str_contains($tok_s, '</div>')) { $div_n--; } # state 6: target close div if ($div_n == 0) { break; } } return implode("n", $out_a); }