Skip to content
Advertisement

Grabbing specific elements inside DIV from external page

I need to scrap the following elements inside each one of these div’s class="product-grid-item" (page contains several of them), but in fact I have no clue how to do it… so, I need help not to pull my hair out.

1 – The link and image inside the div: class="product-element-top2;

<a href="https://...this_link" class="product-image-link"> (just need the link)

<img width="300" height="300" src="https://...this_image_url... (just need this image URL)

2 – The title inside the h3 tag as follows;

<h3 class="wd-entities-title"><a href="https://...linkhere">The title goes here (just the title)

3 – Last but not least, I need to grab tha price inside this;

<span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span> (just the “€20.00”)

Here’s the full HTML:

<div class="product-grid-item" data-loop="1">

<div class="product-element-top">
    <a href="https://...linkhere" class="product-image-link">
        <img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">    </a>
    
    <div class="top-information wd-fill">

        <h3 class="wd-entities-title"><a href="https://...linkhere">The title goes here</a></h3>        
                
        
    <span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span>

        <div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
            <a href="https://...linkhere" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop"><span>Options</span></a></div> 
    </div>

    <div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
                            <div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
                <a href="https://...linkhere" data-added-text="Compare Products">Buy</a>
            </div>
    <div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
                <a href="https://...linkhere" class="open-quick-view quick-view-button">quick view</a>
            </div>
                            <div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
                <a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
            </div>
            </div>
                <div class="quick-shop-wrapper wd-fill wd-scroll">
                <div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon"><a href="#" rel="nofollow noopener">Close</a></div>
                <div class="quick-shop-form wd-scroll-content">
                </div>
            </div>
        </div>
</div>

One of my clumsy attempts:

$html = file_get_contents("https://url-here.goetohere");
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'product-grid-item';
$classname = 'product-element-top2';
$classname = 'product-element-top2';
$classname = 'wd-entities-title';
$classname = 'price';
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $node) {
    echo 'here »» ' . htmlentities($node->nodeValue) . '<br>';
}

Advertisement

Answer

Assuming that the HTML is being fetched correctly prior to attempting any DOM processing then it is fairly straightforward to construct some basic XPath expressions to find the indicated content.

As per the comment page contains several of them there are 2 product-grid-item divs as you’ll note in the output.

$html='
    <div class="product-grid-item" data-loop="1">
        <div class="product-element-top">
            <a href="https://...linkhere" class="product-image-link">
                <img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
            </a>
            <div class="top-information wd-fill">
                <h3 class="wd-entities-title">
                    <a href="https://...linkhere">The title goes here</a>
                </h3>
                <span class="price">
                    <span class="woocommerce-Price-amount amount">
                        <bdi>
                            <span class="woocommerce-Price-currencySymbol">€</span>20,00
                        </bdi>
                    </span>
                </span>
                <div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
                    <a href="https://...linkhere" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
                        <span>Options</span>
                    </a>
                </div> 
            </div>

            <div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
                <div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
                    <a href="https://...linkhere" data-added-text="Compare Products">Buy</a>
                </div>
                <div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
                    <a href="https://...linkhere" class="open-quick-view quick-view-button">quick view</a>
                </div>
                <div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
                    <a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
                </div>
            </div>
            <div class="quick-shop-wrapper wd-fill wd-scroll">
                <div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
                    <a href="#" rel="nofollow noopener">Close</a>
                </div>
                <div class="quick-shop-form wd-scroll-content"></div>
            </div>
        </div>
    </div>
    
    <div class="product-grid-item" data-loop="1">
        <div class="product-element-top">
            <a href="https://www.example.com/banana" class="product-image-link">
                <img width="300" height="300" src="https://www.example.com/kittykat.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
            </a>
            <div class="top-information wd-fill">
                <h3 class="wd-entities-title">
                    <a href="https://www.example.com/womble">Oh look, another title!</a>
                </h3>
                <span class="price">
                    <span class="woocommerce-Price-amount amount">
                        <bdi>
                            <span class="woocommerce-Price-currencySymbol">€</span>540,00
                        </bdi>
                    </span>
                </span>
                <div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
                    <a href="https://www.example.com/gorilla" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
                        <span>Options</span>
                    </a>
                </div> 
            </div>

            <div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
                <div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
                    <a href="https:www.example.com/buy" data-added-text="Compare Products">Buy</a>
                </div>
                <div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
                    <a href="https://www.example.com/view" class="open-quick-view quick-view-button">quick view</a>
                </div>
                <div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
                    <a class="" href="https://www.example.com/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
                </div>
            </div>
            <div class="quick-shop-wrapper wd-fill wd-scroll">
                <div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
                    <a href="#" rel="nofollow noopener">Close</a>
                </div>
                <div class="quick-shop-form wd-scroll-content"></div>
            </div>
        </div>
    </div>';

To process the downloaded HTML

# set the libxml parameters and create new DOMDocument/XPath objects.
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->loadHTML( $html );
libxml_clear_errors();

$xp=new DOMXPath( $dom );

# some basic XPath expressions
$exprs=(object)array(
    'product-link'      =>  '//a[@class="product-image-link"]',
    'product-img-src'   =>  '//a[@class="product-image-link"]/img',
    'h3-title-text'     =>  '//h3[@class="wd-entities-title"]',
    'price'             =>  '//span[@class="price"]/span/bdi'
);
# find the keys (for convenience) to be used below
$keys=array_keys( get_object_vars( $exprs ) );

# store results here
$res=array();

# loop through all patterns and issue XPath query.
foreach( $exprs as $key => $expr ){
    # add key to output and set as an array.
    $res[ $key ]=[];
    $col=$xp->query( $expr );
    
    # find the data if the query succeeds
    if( $col && $col->length > 0 ){
        foreach( $col as $node ){
            switch( $key ){
                case $keys[0]:$res[$key][]=$node->getAttribute('href');break;
                case $keys[1]:$res[$key][]=$node->getAttribute('src');break;
                case $keys[2]:$res[$key][]=trim($node->textContent);break;
                case $keys[3]:$res[$key][]=trim($node->textContent);break;
            }
        }
    }
}
# show the result or do really interesting things with the data
printf('<pre>%s</pre>',print_r($res,true));

Which yields:

Array
(
    [product-link] => Array
        (
            [0] => https://...linkhere
            [1] => https://www.example.com/banana
        )

    [product-img-src] => Array
        (
            [0] => https://image-goes-here.jpg
            [1] => https://www.example.com/kittykat.jpg
        )

    [h3-title-text] => Array
        (
            [0] => The title goes here
            [1] => Oh look, another title!
        )

    [price] => Array
        (
            [0] => â¬20,00
            [1] => â¬540,00
        )

)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement