i am trying to scrape some data from this site : http://laperuanavegana.wordpress.com/ . actually i want the title of recipe and ingredients . ingredients is located inside two specific keyword . i am trying to get this data using regex and simplehtmldom . but its showing the full html text not just the ingredients . here is my code : <?php
include_once('simple_html_dom.php'); $base_url = "http://laperuanavegana.wordpress.com/"; traverse($base_url); function traverse($base_url) { $html = file_get_html($base_url); $k1="Ingredientes"; $k2="PreparaciĆ³n"; preg_match_all("/$k1(.*)$k2/s",$html->innertext,$out); echo $out[0][0]; } ?>
there is multiple ingredients in this page . i want all of them . so using preg_match_all() it will be helpful if anybody detect the bug of this code . thanks in advance.
Advertisement
Answer
You need to add a question mark there. It makes the pattern ungreedy – otherwise it will take everything form the first $k1 to the last $k2 on the page. If you add the question mark it will always take the next $k2.
preg_match_all("/$k1(.*?)$k2/s",$html->innertext,$out);