I know how to get the html source code via cUrl, but I want to remove the comments on the html document (I mean what is between <!-- .. -->
). In addition, if I can take just the BODY
of the html document. thank you.
Advertisement
Answer
Try PHP DOM*:
$html = '<html><body><!--a comment--><div>some content</div></body></html>'; // put your cURL result here $dom = new DOMDocument; $dom->loadHtml($html); $xpath = new DOMXPath($dom); foreach ($xpath->query('//comment()') as $comment) { $comment->parentNode->removeChild($comment); } $body = $xpath->query('//body')->item(0); $newHtml = $body instanceof DOMNode ? $dom->saveXml($body) : 'something failed'; var_dump($newHtml);
Output:
string(36) "<body><div>some content</div></body>"