I trying to change every readable part of given HTML code using DOMDocument and DOMXPath
JavaScript
x
$dom = new DOMDocument();
$dom->loadHTML('
<h3>
TEST_1
<b>b tag content</b>
TEST_2
</h3>
<p>p tag content </p>
');
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//*[count(*) = 0]') as $node) {
$node->nodeValue = "Changed " . $node->nodeValue;
}
echo $dom->saveHTML();
It gives me
JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<h3>
TEST_1
<b>Changed b tag content</b>
TEST_2
</h3>
<p>Changed p tag content</p>
</body>
</html>
But strings “TEST_1” and “TEST_2” not changed, because of $xpath->evaluate(‘//[count() = 0]’) gives me only nodes without childrens.
- How to get all nodes with nodes like “TEST_1” and “TEST_2”?
- How to prevent adding
<html>
and<body>
tags to result?
Advertisement
Answer
Unfortunately, I did not find the correct xpath expression. Solved the problem by recursion. This works:
JavaScript
function rewrite_all_nodes(&$node) {
if(count($node->childNodes) > 1){
foreach($node->childNodes as $sub_node){
change_all_nodes($sub_node);
}
} else {
if(!empty(trim($node->nodeValue))){
$node->nodeValue = "Changed";
}
}
}
To cut off <body>
and <html>
tags I found this https://stackoverflow.com/a/38079328/14495402
If you know xpath style solution, please share))