I have a simple HTML file that contains data I’m trying to scrape out so that I can work with the variables.
<html> <head> <link rel="stylesheet" type="text/css" href="/app/css/style.css" /> </head> <body> <div class="page"> <div class ="pane"> <div class ="chart"> <h1 style='float: left;'>Summary</h1> <div style='clear: both;'></div> <script type="text/javascript" src="protovis/protovis-d3.2.js"></script> <script type="text/javascript+protovis">var data = [ {"label":"A (2)", 'complete': 2.0, 'pending': 0.0} ,{"label":" B (8)", 'complete': 8, 'pending': 0.0} ,{"label":"C (10)", 'complete': 10, 'pending': 0.0} ,{"label":"D (18)", 'complete': 18.0, 'pending': 0.0} ,{"label":"E (21)", 'complete': 21, 'pending': 0.0} ]; </script> </div> </div> </div> </body> </html>
Using PHP, I’m trying to parse the data contained on this HTML in to variables. I.E.: $A = 2
, $B = 8
, $C = 10
, $D = 18
, $E = 21
.
So far, I’ve been trying to use the simple_html_dom.php library to read the data, but I haven’t been able to retrieve the contents of the JSON contained within the JavaScript above.
How can I pull "label":"A (2)"
out of the HTML above so that I can access the value (in this case 2) as a PHP variable?
Advertisement
Answer
i solved it with file_get_contents()
.
note:
the parsing is really just a quick and dirty solution. it only works if you got only one [
and one ]
in your file which have to mark your json-string. so if you need to use this on a lot of files, you should use another parsing method for sure.
- read the html into a string:
$html = file_get_contents("my_file.html");
- parse your html for the json-string (like noticed this is bad practice and just a quick&dirty solution for your exact problem):
$json = substr($html, strpos($html, '[') , strpos($html, ']') - (strpos($html, '[')-1));
- replace singlequotes by doublequotes (since there need to be double-quotes so that the string is valid json):
$json = str_replace("'", """, $json);
- use
json_decode
to convert your json to a php array. note: you need to set the 2nd parameter totrue
in order to use “associative mode”:
$my_array = json_decode($json, true);
here you go, when you do var_dump($my_array)
you see that everything is saved into the php array:
array(5) { [0]=> array(3) { ["label"]=> string(5) "A (2)" ["complete"]=> float(2) ["pending"]=> float(0) } [1]=> array(3) { ["label"]=> string(6) " B (8)" ["complete"]=> int(8) ["pending"]=> float(0) } ... ]