Skip to content
Advertisement

PHP- Recursive Regex to get complete Div Class with it’s inner content

I have searched but cannot find a solution that works. I have tried using DOM but the result is not identical (different spaces and tag elements – minor differences but I need identical for further pattern searches on the source) to the source, hence I would like to try regex. Is this possible (I know it isn’t best solution but would like to try it)? For example is it possible to return all of the div class “want-this-entire-div-class” including inner:

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

The following stops after the first div>

preg_match(‘/<div class=”want-this-entire-div-class”(.*?)</div>/s’, $html, $match); Thanks

Advertisement

Answer

One way to tackle this is with a state machine. You enumerate all the possible states, then take action depending on what state you are in. In this case its

  1. line to ignore
  2. target open div
  3. line to add
  4. extra open div
  5. extra close div
  6. target close div

I dont expect this is robust, but it does work for the given example:

<?php
function inner_div(string $html_s, string $cont_s): string {
   $html_a = explode("n", $html_s);
   $div_b = false;
   $div_n = 0;
   foreach ($html_a as $tok_s) {
      # state 2: target open div
      if (str_contains($tok_s, 'want-this-entire-div-class')) {
         $div_b = true;
      }
      # state 1: line to ignore
      if (! $div_b) {
         continue;
      }
      # state 3: line to add
      $out_a[] = $tok_s;
      # state 4: extra open div
      if (str_contains($tok_s, '<div')) {
         $div_n++;
      }
      # state 5: extra close div
      if (str_contains($tok_s, '</div>')) {
         $div_n--;
      }
      # state 6: target close div
      if ($div_n == 0) {
         break;
      }
   }
   return implode("n", $out_a);
}
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement