Here I have taken source code snipper from webpage : http://www.yelp.com/biz/franchino-san-francisco?start=80.
I want to scrape date, review, rate for each block on the page.
@: http://ideone.com/fork/Yfw2re
I am not much familiar with DOM element, I appreciate if someone can correct this
Here is the code :
<?php // your code goes here $html = <<< EOF <div class="review-wrapper"> <div class="review-content"> <div class="biz-rating biz-rating-very-large clearfix"> <div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating"> <div class="rating-very-large"> <i class="star-img stars_5" title="5.0 star rating"> <img alt="5.0 star rating" class="offscreen" height="303" src="http://s3-media3.ak.yelpcdn.com/assets/2/www/img/c2252a4cd43e/ico/stars/v2/stars_map.png" width="84"> </i> <meta itemprop="ratingValue" content="5.0"> </div> </div> <span class="rating-qualifier"> <meta itemprop="datePublished" content="2013-10-28"> 10/28/2013 </span> </div> <p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and 'home-cooking'. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p> </div> <div class="review-footer clearfix"> <div class="rateReview ufc-feedback clearfix" data-review-id="SnZ4Q97nJdR7a-fot-Slcw"> <p class="review-intro review-message"> Was this review …? </p> EOF; $dom = new DOMDocument(); @$dom->loadHTML($html); $classname = 'review-content' $finder = new DomXPath($dom); $nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]"); $tmp_dom = new DOMDocument(); foreach($nodes as $result) { //getting rate value from '<meta itemprop="ratingValue" content="5.0">' //getting date from <span class="rating-qualifier"> <meta itemprop="datePublished" content="2013-10-28"> 10/28/2013 </span> //getting review from ' <p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and 'home-cooking'. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p> ' }
Advertisement
Answer
you can loop through the class
values or tag
names like this :
$classname = 'rating-qualifier'; $dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $results = $xpath->query("//*[@class='" . $classname . "']"); if ($results->length > 0) { echo $review = $results->item(0)->nodeValue; } $classname = 'review_comment ieSucks'; $dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $results = $xpath->query("//*[@class='" . $classname . "']"); if ($results->length > 0) { echo $review = $results->item(0)->nodeValue; } $meta = $dom->documentElement->getElementsByTagName("meta"); echo $meta->item(0)->getAttribute('content');
you can obviously loop the rating part to get all the ratings on the page using a simple for
loop.
demo here : https://eval.in/143036