Skip to content
Advertisement

PHP iterate every node of html string including text nodes splited by other nodes

I trying to change every readable part of given HTML code using DOMDocument and DOMXPath

$dom = new DOMDocument();
$dom->loadHTML('
    <h3> 
        TEST_1
        <b>b tag content</b>
        TEST_2
    </h3> 
    <p>p tag content </p>
');

$xpath = new DOMXPath($dom);

foreach ($xpath->evaluate('//*[count(*) = 0]') as $node) {
  $node->nodeValue = "Changed " . $node->nodeValue;
}

echo $dom->saveHTML();

It gives me

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
    <body>
        <h3> 
            TEST_1
            <b>Changed b tag content</b>
            TEST_2
        </h3> 
        <p>Changed p tag content</p>
    </body>
</html>

But strings “TEST_1” and “TEST_2” not changed, because of $xpath->evaluate(‘//[count() = 0]’) gives me only nodes without childrens.

  1. How to get all nodes with nodes like “TEST_1” and “TEST_2”?
  2. How to prevent adding <html> and <body> tags to result?

Advertisement

Answer

Unfortunately, I did not find the correct xpath expression. Solved the problem by recursion. This works:

function rewrite_all_nodes(&$node) {    
   if(count($node->childNodes) > 1){
      foreach($node->childNodes as $sub_node){
            change_all_nodes($sub_node);
      }
   } else {
      if(!empty(trim($node->nodeValue))){
         $node->nodeValue = "Changed";       
      }
   }
}

To cut off <body> and <html> tags I found this https://stackoverflow.com/a/38079328/14495402

If you know xpath style solution, please share))

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement