Tag: web-scraping

In Symfony/Panther when scraping, waitfor function will throw exception if it timesout – i need it to continue if item is not found

php symfony symfony-panther web-crawler web-scraping

I have a database of clinics, and an url to each clinic. All clinic pages are the same in terms of html/css, with different content to scrape. However, some clinics have no content on their page, and this causes trouble for me. I have: If .facility is not present, the waitFor() will throw exception because of timeout. I need to

Guzzle Symfony scrape iframes inside multiple Servers

domcrawler guzzle php symfony web-scraping

I am building a scraper to scrape content using guzzle and symfony dom crawler But I run into an issue. The page I am scraping has multiple Iframe servers They default iframe is shown when the scraper loads the page but in order to get the other servers it needs to click there buttons and so it reflects the server

How to set referer Header in Guzzle and get CDN Content

guzzle php symfony web-crawler web-scraping

I want to scrape a website and am using guzzle 7.4 and Symfony Dom Crawler I successfully retrieved the HTML data But the website is using CDN to host some resources and they are not loading because the header is not sent to get those resources below is code retrieving html If I access the CDN directly and set referer

How to crawl page in PHP?

php web-crawler web-scraping

I get the error: “error code: 1020″. The page I’m trying to crawl for form data is: https://v2.gcchmc.org/medical-status-search/. This is my code: $initial = file_get_contents(‘https://v2.gcchmc.org/medical-status-search/’); $check = preg_replace(‘/.+?input type=”hidden” name=”csrfmiddlewaretoken” value=”(.+?)”.*/sim’, ‘$1’. $initial); print $check; “error code: 1020” the page I am trying to crawl for form data is https://v2.gcchmc.org/medical-status-search/. Can you help me what’s wrong in the code below.

How to get price value with regular expressions

php regex web-crawler web-scraping

I am trying to write a crawler for an Online Store and now I need to get the price value of the webpage. Here is my try: Basically $html holds the source code of the webpage and the price value is stored at the document like this: <div class=”c-product__seller-price-pure js-price-value”>10,699,000</div> But when I run this I get this as result:

Is it possible to get actual video link from embedded iframe

javascript php web-scraping

I tried to get video from an embedded iFrame, but JS is de-obfuscated, may there is an actual src link hidden in JS, I tried my best and couldn’t de-obfuscate, if I click Play I can find the source, …

Passing parameters from command line to script

php screen-scraping simple-html-dom web-scraping

I am writing a program to scrape the following website: https://filmstoon.in/ From it, I want to find several movies (Batman Begins, Iron Man, Expendables 3) and TV series (Game of Thrones) and to scrape the title, the host url and the meta url. I managed to do this, however, it is manually crafted for the specific titles. The code: Everything

Extract links from a list of urls

curl hyperlink php scrape web-scraping

I am trying to extract all the links from a set list of or urls in a text file and save the extracted links in another text file. I am trying to use the script below which was originally meant to extract Emails: I changed the the email extract part to extract links like this: Here is the full code:

Resolve “Fatal Error: Call to a member function children() on null”

php simple-html-dom web-scraping

I am using PHP HTML DOM Parser to traverse a table DOM on a third party site and print out a particular set of values in a td element. This works for the first two columns I traverse. However, in the 3rd column, the $e operation returns Null. The HTML for that element is: The problem I have, is that

Getting a nested element PHP HTML Simple Dom

html php web-scraping

I want the commented items from an external website. I cannot edit the website. The website looks like this, i editted a lot of things out but this is the path from the body: I am using PHP HTML Simple Dom and PHP 7.3. I am currently using this code to get the information from the website: I get an