Skip to content
Advertisement

Download images from html and keep the folder structure

I need to have download over 100.000 pictures. The Pictures have : .png, .jpg, .jpeg, .gif format. I have the approval to use those pictures. they have provide me an XML file with all the url`s

The url have the structure

otherdomain/productimages/code/imagename.jpg/.png/.gif

I have all the codes in an php array called $codes[] I have also the full path of the all images on an array $images[]

I need to have all those pictures downloaded and keep the same structure

mydomain/productimages/code/imagename.jpg/.png/.gif

What i have so far due my reasearch over internet is :

Looping over all the pages ( each hotel code )

   $i = 1;
   $r = 100000;

while ($i < $r) {
    $html = get_data('http://otherdomain.com/productimages/'.$codes[$i].'/');
    getImages($html);
    $codes[$i++];
}

    function getImages($html) {
        $matches = array();
        $regex = '~http://otherdomain.com/productimages/(.*?).jpg~i';
        preg_match_all($regex, $html, $matches);
        foreach ($matches[1] as $img) {
            saveImg($img);
        }
    }

    function saveImg($name) {
        $url = 'http://otherdomain.com/productimages/'.$name.'.jpg';
        $data = get_data($url);
        file_put_contents('photos/'.$name.'.jpg', $data);
    }

Could you help me to get this working as the script doesnt work at all

Advertisement

Answer

I may suggest you the easier and faster approach to the task. Write a complete URLs to the list.txt execute wget -x -i list.txt command which will download all the images and put them in appropriate directories according to the site structure.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement