Skip to content
Advertisement

how to dom html url with php?

This is the URL that I want to parse: http://www.tsetmc.com/Loader.aspx?ParTree=151313&Flow=0

I use simple_html_dom.php but it can’t read the HTML because the HTML is encoded.

So I think I should parse online and webpage source. Is there any way that I can parse this web site?

The source code looks like this:

JavaScript

my code:

JavaScript

Advertisement

Answer

The issue, as you pointed out was the encoding, it’s gzip encoded. You can set the flag in curl CURLOPT_ENCODING to work around that. What it does, as provided by php-curl documentation:

The contents of the “Accept-Encoding: ” header. This enables decoding of the response. Supported encodings are “identity”, “deflate”, and “gzip”. If an empty string, “”, is set, a header containing all supported encoding types is sent.

Use the following php-curl code to get the response html like this:

JavaScript

Then you can use the response html $response directly in simple_html_dom.php to parse the dom tree.

Here’s a working version of the code. http://phpfiddle.org/main/code/gb66-3kzq

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement