i am trying to scrape some data from this site : http://laperuanavegana.wordpress.com/ . actually i want the title of recipe and ingredients . ingredients is located inside two specific keyword . i am trying to get this data using regex and simplehtmldom . but its showing the full html text not just the ingredients . here is my code : <?php
JavaScript
x
include_once('simple_html_dom.php');
$base_url = "http://laperuanavegana.wordpress.com/";
traverse($base_url);
function traverse($base_url)
{
$html = file_get_html($base_url);
$k1="Ingredientes";
$k2="Preparación";
preg_match_all("/$k1(.*)$k2/s",$html->innertext,$out);
echo $out[0][0];
}
?>
there is multiple ingredients in this page . i want all of them . so using preg_match_all() it will be helpful if anybody detect the bug of this code . thanks in advance.
Advertisement
Answer
You need to add a question mark there. It makes the pattern ungreedy – otherwise it will take everything form the first $k1 to the last $k2 on the page. If you add the question mark it will always take the next $k2.
JavaScript
preg_match_all("/$k1(.*?)$k2/s",$html->innertext,$out);