Tag: web-crawler

In Symfony/Panther when scraping, waitfor function will throw exception if it timesout – i need it to continue if item is not found

php symfony symfony-panther web-crawler web-scraping

I have a database of clinics, and an url to each clinic. All clinic pages are the same in terms of html/css, with different content to scrape. However, some clinics have no content on their page, and this causes trouble for me. I have: If .facility is not present, the waitFor() will throw exception because of…

How to set referer Header in Guzzle and get CDN Content

guzzle php symfony web-crawler web-scraping

I want to scrape a website and am using guzzle 7.4 and Symfony Dom Crawler I successfully retrieved the HTML data But the website is using CDN to host some resources and they are not loading because the header is not sent to get those resources below is code retrieving html If I access the CDN directly and se…

How to crawl page in PHP?

php web-crawler web-scraping

I get the error: “error code: 1020″. The page I’m trying to crawl for form data is: https://v2.gcchmc.org/medical-status-search/. This is my code: $initial = file_get_contents(‘https://v2.gcchmc.org/medical-status-search/’); $check = preg_replace(‘/.+?input type=”hidd…

How to get price value with regular expressions

php regex web-crawler web-scraping

I am trying to write a crawler for an Online Store and now I need to get the price value of the webpage. Here is my try: Basically $html holds the source code of the webpage and the price value is stored at the document like this: <div class=”c-product__seller-price-pure js-price-value”>10,6…

Symfony crawler select OPTION in SELECT list without FORM

php symfony symfony-panther web-crawler

I am crawling a website that has a SELECT that are freestanding with no FORM parent and no NAME, only ID. I am able to select it with and will open the list, but how can I select a value in the list by value or name ? Answer Try something like and see if it works.

Using php-spider, is there a standard Xpath that might discover the URIs on most web sites?

php web-crawler

I am using the wonderful script entitled php-spider with the goal of scraping the Title, Desc, H1, H2, H3, and H4 from a few web sites. As part of configuring the script, it is necessary to set an ‘…

PHP code for moving the cursor using the twitter API

php twitter web-crawler

So I already have a script that collects the first 4999 followers ids of a twitter user using the API in xml format. I semi understand how the cursor process works but I am confused how to implement it to loop until it gathers all the followers. The user I am attempting to gather will take about 8 calls. Any

how to detect search engine bots with php?

bots php web-crawler

How can one detect the search engine bots using php? Answer Here’s a Search Engine Directory of Spider names Then you use $_SERVER[‘HTTP_USER_AGENT’]; to check if the agent is said spider.