I trying to change every readable part of given HTML code using DOMDocument and DOMXPath
$dom = new DOMDocument(); $dom->loadHTML(' <h3> TEST_1 <b>b tag content</b> TEST_2 </h3> <p>p tag content </p> '); $xpath = new DOMXPath($dom); foreach ($xpath->evaluate('//*[count(*) = 0]') as $node) { $node->nodeValue = "Changed " . $node->nodeValue; } echo $dom->saveHTML();
It gives me
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <body> <h3> TEST_1 <b>Changed b tag content</b> TEST_2 </h3> <p>Changed p tag content</p> </body> </html>
But strings “TEST_1” and “TEST_2” not changed, because of $xpath->evaluate(‘//[count() = 0]’) gives me only nodes without childrens.
- How to get all nodes with nodes like “TEST_1” and “TEST_2”?
- How to prevent adding
<html>
and<body>
tags to result?
Advertisement
Answer
Unfortunately, I did not find the correct xpath expression. Solved the problem by recursion. This works:
function rewrite_all_nodes(&$node) { if(count($node->childNodes) > 1){ foreach($node->childNodes as $sub_node){ change_all_nodes($sub_node); } } else { if(!empty(trim($node->nodeValue))){ $node->nodeValue = "Changed"; } } }
To cut off <body>
and <html>
tags I found this https://stackoverflow.com/a/38079328/14495402
If you know xpath style solution, please share))