Skip to content
Advertisement

Retrieving variables from HTML with JSON data contained in Javascript

I have a simple HTML file that contains data I’m trying to scrape out so that I can work with the variables.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="/app/css/style.css" />
</head>
<body>
    <div class="page">
        <div class ="pane">
            <div class ="chart">
                <h1 style='float: left;'>Summary</h1>
                <div style='clear: both;'></div>

                <script type="text/javascript" src="protovis/protovis-d3.2.js"></script>
                <script type="text/javascript+protovis">var data = [
                {"label":"A (2)", 'complete': 2.0, 'pending': 0.0}
                ,{"label":" B (8)", 'complete': 8, 'pending': 0.0}
                ,{"label":"C (10)", 'complete': 10, 'pending': 0.0}
                ,{"label":"D (18)", 'complete': 18.0, 'pending': 0.0}
                ,{"label":"E (21)", 'complete': 21, 'pending': 0.0}
                ];
                </script>
            </div>
        </div>
    </div>  
</body>
</html>

Using PHP, I’m trying to parse the data contained on this HTML in to variables. I.E.: $A = 2, $B = 8, $C = 10, $D = 18, $E = 21.

So far, I’ve been trying to use the simple_html_dom.php library to read the data, but I haven’t been able to retrieve the contents of the JSON contained within the JavaScript above.

How can I pull "label":"A (2)" out of the HTML above so that I can access the value (in this case 2) as a PHP variable?

Advertisement

Answer

i solved it with file_get_contents().

note: the parsing is really just a quick and dirty solution. it only works if you got only one [ and one ] in your file which have to mark your json-string. so if you need to use this on a lot of files, you should use another parsing method for sure.

  1. read the html into a string:

$html = file_get_contents("my_file.html");

  1. parse your html for the json-string (like noticed this is bad practice and just a quick&dirty solution for your exact problem):

$json = substr($html, strpos($html, '[') , strpos($html, ']') - (strpos($html, '[')-1));

  1. replace singlequotes by doublequotes (since there need to be double-quotes so that the string is valid json):

$json = str_replace("'", """, $json);

  1. use json_decode to convert your json to a php array. note: you need to set the 2nd parameter to true in order to use “associative mode”:

$my_array = json_decode($json, true);

here you go, when you do var_dump($my_array) you see that everything is saved into the php array:

array(5) {
  [0]=>
  array(3) {
    ["label"]=>
    string(5) "A (2)"
    ["complete"]=>
    float(2)
    ["pending"]=>
    float(0)
  }
  [1]=>
  array(3) {
    ["label"]=>
    string(6) " B (8)"
    ["complete"]=>
    int(8)
    ["pending"]=>
    float(0)
  }
  ...
]
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement