I try to get a booking.com page from a hotel to fetch the prices afterwards with regex. The problem is the following:
I call file_get_contents with parameter like checkin and checkout (file_get_contents("/hotel/at/myhotel.html?checkin=2017-10-12&checkout=2017-10-13"
)) dates so that the prices are shown to the visitor. If I watch the source code in the browser I see the entry:
b_this_url : '/hotel/at/myhotel.html?label=gen173nr-1FCAsoDkIcbmV1ZS1wb3N0LWhvbHpnYXUtaW0tbGVjaHRhbEgHYgVub3JlZmgOiAEBmAEHuAEHyAEM2AEB6AEB-AEDkgIBeagCAw;sid=58ccf750fc4acb908e20f0f28544c903;checkin=2017-10-12;checkout=2017-10-13;dist=0;sb_price_type=total;type=total&',
If I echo the string from file_get_contents the string looks like:
b_this_url : '/hotel/at/myhotel.html',
So all parameters that I passed to the url with file_get_contents are gone and therefore I couldn’t find any prices with my regex on the page …
Does anyone have a solution for this problem?
Advertisement
Answer
The webpage is not completely generated server-side, but it relies heavily on JavaScript after the HTML part loads. If you are looking for rendering the page as it looks in browser, I think you should use php curl
instead of file_get_contents()
for this kind of web scraping thing. I generated an automatic code for you from Postman (a google chrome extension / standalone desktop app) for your given url. The response contains the full url with params. See the image and I posted the code for you also.
<?php
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "https://www.booking.com/hotel/at/hilton-innsbruck.de.html?checkin=2017-10-10%3Bcheckout%3D2017-10-11",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "GET",
CURLOPT_HTTPHEADER => array(
"cache-control: no-cache",
"postman-token: 581a75a7-6600-6ed6-75fd-5fb09c25d927"
),
));
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}