I get the error: “error code: 1020”. The page I’m trying to crawl for form data is: https://v2.gcchmc.org/medical-status-search/.
This is my code:
$initial = file_get_contents('https://v2.gcchmc.org/medical-status-search/'); $check = preg_replace('/.+?input type="hidden" name="csrfmiddlewaretoken" value="(.+?)".*/sim', '$1'. $initial); print $check;
“error code: 1020” the page I am trying to crawl for form data is https://v2.gcchmc.org/medical-status-search/. Can you help me what’s wrong in the code below.
Advertisement
Answer
The site is protected by cloudflare. You can bypass the cloudflare when you have javascript enabled, so through command line is not going to work. You can however automate this by using Puppeteer for example, which also is available in PHP. But you have to disable headless to make it work.
Installation
composer require nesk/puphpeteer npm install @nesk/puphpeteer
The script (test.php)
use NeskPuphpeteerPuppeteer; require_once __DIR__ . "/vendor/autoload.php"; function getToken($content) { preg_match_all('/.+?input type="hidden" name="csrfmiddlewaretoken" value="(.+?)".*/sim', $content, $matches); return $matches[1][0]; } $puppeteer = new Puppeteer; $browser = $puppeteer->launch(['headless'=>false]); /** * @var $page NeskPuphpeteerResourcesPage */ $page = $browser->newPage(); $page->goto('https://v2.gcchmc.org/medical-status-search/'); var_dump(getToken($page->content())); $browser->close();
Now you probably don’t need the csrfmiddlewaretoken when running the script like this, but you can take it further from here if you chose to use this feature.