Skip to content
Advertisement

How to convert HTML to JSON using PHP?

I can convert JSON to HTML using JsontoHtml library. Now,I need to convert present HTML to JSON as shown in this site. When looked into the code I found the following script:

JavaScript

enter image description here

Now, I am in need of using the following function in PHP. I can get the HTML data. All what I needed now is to convert the JavaScript function to PHP function. Is this possible? My major doubts are as follows:

  • The primary input for the Javascript function toTransform() is an object. Is it possible to convert HTML to object via PHP?

  • Are all the functions present in this particular JavaScript available in PHP?

Please suggest me the idea.

When I tried to convert script tag to json as per the answer given, I get errors. When I tried it in json2html site, it showed like this:enter image description here .. How to achieve the same solution?

Advertisement

Answer

If you are able to obtain a DOMDocument object representing your HTML, then you just need to traverse it recursively and construct the data structure that you want.

Converting your HTML document into a DOMDocument should be as simple as this:

JavaScript

Then, a simple traversal of $dom->documentElement which gives the kind of structure you described could look like this:

JavaScript

Test case

JavaScript

Output

JavaScript

Answer to updated question

The solution proposed above does not work with the <script> element, because it is parsed not as a DOMText, but as a DOMCharacterData object. This is because the DOM extension in PHP is based on libxml2, which parses your HTML as HTML 4.0, and in HTML 4.0 the content of <script> is of type CDATA and not #PCDATA.

You have two solutions for this problem.

  1. The simple but not very robust solution would be to add the LIBXML_NOCDATA flag to DOMDocument::loadHTML. (I am not actually 100% sure whether this works for the HTML parser.)

  2. The more difficult but, in my opinion, better solution, is to add an additonal test when you are testing $subElement->nodeType before the recursion. The recursive function would become:

JavaScript

If you hit on another bug of this type, the first thing you should do is check the type of node $subElement is, because there exists many other possibilities my short example function did not deal with.

Additionally, you will notice that libxml2 has to fix mistakes in your HTML in order to be able to build a DOM for it. This is why an <html> and a <head> elements will appear even if you don’t specify them. You can avoid this by using the LIBXML_HTML_NOIMPLIED flag.

Test case with script

JavaScript

Output

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement