I have a simple HTML file that contains data I’m trying to scrape out so that I can work with the variables.
<html>
<head>
<link rel="stylesheet" type="text/css" href="/app/css/style.css" />
</head>
<body>
<div class="page">
<div class ="pane">
<div class ="chart">
<h1 style='float: left;'>Summary</h1>
<div style='clear: both;'></div>
<script type="text/javascript" src="protovis/protovis-d3.2.js"></script>
<script type="text/javascript+protovis">var data = [
{"label":"A (2)", 'complete': 2.0, 'pending': 0.0}
,{"label":" B (8)", 'complete': 8, 'pending': 0.0}
,{"label":"C (10)", 'complete': 10, 'pending': 0.0}
,{"label":"D (18)", 'complete': 18.0, 'pending': 0.0}
,{"label":"E (21)", 'complete': 21, 'pending': 0.0}
];
</script>
</div>
</div>
</div>
</body>
</html>
Using PHP, I’m trying to parse the data contained on this HTML in to variables. I.E.: $A = 2
, $B = 8
, $C = 10
, $D = 18
, $E = 21
.
So far, I’ve been trying to use the simple_html_dom.php library to read the data, but I haven’t been able to retrieve the contents of the JSON contained within the JavaScript above.
How can I pull "label":"A (2)"
out of the HTML above so that I can access the value (in this case 2) as a PHP variable?
Advertisement
Answer
i solved it with file_get_contents()
.
note:
the parsing is really just a quick and dirty solution. it only works if you got only one [
and one ]
in your file which have to mark your json-string. so if you need to use this on a lot of files, you should use another parsing method for sure.
- read the html into a string:
$html = file_get_contents("my_file.html");
- parse your html for the json-string (like noticed this is bad practice and just a quick&dirty solution for your exact problem):
$json = substr($html, strpos($html, '[') , strpos($html, ']') - (strpos($html, '[')-1));
- replace singlequotes by doublequotes (since there need to be double-quotes so that the string is valid json):
$json = str_replace("'", """, $json);
- use
json_decode
to convert your json to a php array. note: you need to set the 2nd parameter totrue
in order to use “associative mode”:
$my_array = json_decode($json, true);
here you go, when you do var_dump($my_array)
you see that everything is saved into the php array:
array(5) {
[0]=>
array(3) {
["label"]=>
string(5) "A (2)"
["complete"]=>
float(2)
["pending"]=>
float(0)
}
[1]=>
array(3) {
["label"]=>
string(6) " B (8)"
["complete"]=>
int(8)
["pending"]=>
float(0)
}
]