PHP Goutte Web Scraping

Question

I want to scrape this:  Japan Sun Apple &#8211; Fuji 2 per pack  This is my code: use Goutte&#8230;

Accepted Answer

Looking at the HTML markup, the text node that you want falls into the first child of the anchor. Since each $node is an instance of DOMElement, you can use ->firstChild (targeting the text node), then use ->nodeValue:foreach ($crawler->filter('a.pdt_title') as $node) {    echo $node->firstChild->nodeValue . "n";}Another alternative is to use xpath, via ->filterXpath(), its in the docs by the way:foreach ($crawler->filterXpath('//a[@class="pdt_title"]/text()') as $text) {    echo $text->nodeValue , "n";}Related docs:https://symfony.com/doc/current/components/dom_crawler.htmlThe xpath query just targets the anchor with that class and then the text.Or another one liner. It returns an array, extracting the texts:$output = $crawler->filterXpath('//a[@class="pdt_title"]/text()')->extract(array('_text'));Related DOM Docs:http://php.net/manual/en/class.domelement.phphttp://php.net/manual/en/class.domnode.php

Advertisement

Answer