Skip to content
Advertisement

scrape data using regex and simplehtmldom

i am trying to scrape some data from this site : http://laperuanavegana.wordpress.com/ . actually i want the title of recipe and ingredients . ingredients is located inside two specific keyword . i am trying to get this data using regex and simplehtmldom . but its showing the full html text not just the ingredients . here is my code : <?php

include_once('simple_html_dom.php');
$base_url = "http://laperuanavegana.wordpress.com/";

traverse($base_url);


function traverse($base_url)
{
    
    $html = file_get_html($base_url);
    $k1="Ingredientes";
    $k2="PreparaciĆ³n";
    preg_match_all("/$k1(.*)$k2/s",$html->innertext,$out);
    echo $out[0][0];
}

?>

there is multiple ingredients in this page . i want all of them . so using preg_match_all() it will be helpful if anybody detect the bug of this code . thanks in advance.

Advertisement

Answer

You need to add a question mark there. It makes the pattern ungreedy – otherwise it will take everything form the first $k1 to the last $k2 on the page. If you add the question mark it will always take the next $k2.

preg_match_all("/$k1(.*?)$k2/s",$html->innertext,$out);
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement