Skip to content
Advertisement

Website Scraping from DoMDocument using php

I have a php code that could extract the categories and display them. However, I still can’t extract the numbers that goes along with it too(without the bracket). Need to be separated between the categories and number(not extract together). Maybe do another for loop using Regex, etc…

This is the code:

<?php
    $grep = new DoMDocument();
    @$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp");

    $finder = new DomXPath($grep);
    $class = "CatLevel1";
    $nodes = $finder->query("//*[contains(@class, '$class')]");

    foreach ($nodes as $node) {
        $span = $node->childNodes;
        echo $span->item(0)->nodeValue."<br>";
    }
?>

Is there any way I could do that? Thanks!

This is my desired output:

Arts, Antiques & Collectibles : 9768<br>
B2B & Industrial Products : 2342<br>
Baby : 3453<br>
etc...

Advertisement

Answer

Just add the other sibling as well. Example:

foreach ($nodes as $node) {
    $span = $node->childNodes;
    echo $span->item(0)->nodeValue . ': ' . str_replace(array('(', ')'), '', $span->item(1)->nodeValue);
    echo '<br/>';
}

EDIT: Just use str_replace for that simple purpose of removing that parenthesis.

Sidenote: Always put the UTF-8 Encoding on your PHP file.

header('Content-Type: text/html; charset=utf-8');
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement