How to remove HTML tags as well as HTML content within a string in PHP?

Question

I have a .txt file. Using the following code I read it: Now from the retrieved string I want to remove not only the HTML tags but also the HTML content inside. Found many solutions to remove the tags but not both - tags + content. Sample string - Hey my name is John. I am a coder! Required output

Accepted Answer

One way to achieve this is by using DOMDocument and DOMXPath. My solution assumes that the provided HTML string has no container node or that the container node contents are not meant to be stripped (as this would result in a completely empty string).$string = 'Hey my name is John. I am a coder!';// create a DOMDocument (an XML/HTML parser)$dom = new DOMDocument('1.0', 'UTF-8');// load the HTML string without adding a and tags// and with error/warning reports turned off// if loading fails, there's something seriously wrong with the HTMLif($dom->loadHTML($string, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED | LIBXML_NOERROR | LIBXML_NOWARNING)) { // create an DOMXPath instance for the loaded document $xpath = new DOMXPath($dom); // remember the root node; DOMDocument automatically adds a

container if one is not present $rootNode = $dom->documentElement; // fetch all descendant nodes (children and grandchildren, etc.) of the root node $childNodes = $xpath->query('//*', $rootNode); // with each of these decendants... foreach($childNodes as $childNode) { // ...remove them from their parent node $childNode->parentNode->removeChild($childNode); } // echo the sanitized HTML echo $rootNode->nodeValue . "n";}If you do want to strip a potential container code then it’s going to be a bit harder, because it’s difficult to differentiate between an original container node and a container node that’s automatically added by DOMDocument.Also, if an unintended non-closing tag is found, it can lead to unexpected results, as it will strip everything until the next closing tag, because DOMDocument will automatically add a closing tag for invalid non-closing tags.

Advertisement

Answer