Skip to content
Advertisement

PHP Goutte Web Scraping

I want to scrape this:

JavaScript

This is my code:

JavaScript

I only want to scrape the text inside “a” tag without the text inside “span” tag. How to only get the text inside “a” tag?

Advertisement

Answer

Looking at the HTML markup, the text node that you want falls into the first child of the anchor. Since each $node is an instance of DOMElement, you can use ->firstChild (targeting the text node), then use ->nodeValue:

JavaScript

Another alternative is to use xpath, via ->filterXpath(), its in the docs by the way:

JavaScript

Related docs:

https://symfony.com/doc/current/components/dom_crawler.html

The xpath query just targets the anchor with that class and then the text.

Or another one liner. It returns an array, extracting the texts:

JavaScript

Related DOM Docs:

http://php.net/manual/en/class.domelement.php
http://php.net/manual/en/class.domnode.php

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement