I’m trying to use the LIBXML* constants for the 2nd parameter of SimpleXMLElement
constructor, but they don’t change anything at all.
$xml = '<root><empty_tag/><foo></foo></root>'; $simpleXml = new SimpleXMLElement($xml, LIBXML_NOENT|LIBXML_NOXMLDECL|LIBXML_NOEMPTYTAG); $simpleXml->foo = 'Ņ'; echo $simpleXml->asXML();
Expected:
<root><empty_tag></empty_tag><foo>Ņ</foo></root>
Actual:
<?xml version="1.0"?> <root><empty_tag/><foo>Ņ</foo></root>
As you can see, not a single one of those flags does anything – the entity is still escaped (even though XML should only escape "'&><
according to https://www.w3.org/TR/xml/#syntax), the XML declaration is still there, and the empty tag remains empty.
Is there a way to achieve the desired result using SimpleXML? Or at the very least make only escape the 5 special characters? addChild()
is not an option here, I’m assigning existing nodes.
Advertisement
Answer
These constants might be a bit cryptic in their naming. So what actually is supported?
LIBXML_NOENT
Are entities added as entity references to the document or are they expanded. Needs to be specified by loading the document:
<?php $xml = '<!DOCTYPE test [<!ENTITY c "TEST">]> <test>&c;</test>'; echo (new SimpleXMLElement($xml))->asXML(), "n"; echo (new SimpleXMLElement($xml, LIBXML_NOENT))->asXML(), "n";
This shows the first output:
<?xml version="1.0"?> <!DOCTYPE test [ <!ENTITY c "TEST"> ]> <test>&c;</test>
The entity is preserved. And for the second echo, with LIBXML_NOENT
:
<?xml version="1.0"?> <!DOCTYPE test [ <!ENTITY c "TEST"> ]> <test>TEST</test>
The XML is borrowed from a related Q&A: What does LIBXML_NOENT do (and why isn’t it called LIBXML_ENT)?
This is by the way not related to the non US-ASCII character you’ve got with your document. If you need to have the document w/ it, set the encoding to UTF-8 for example:
$xml = '<root><empty_tag/><foo></foo></root>'; $simpleXml = new SimpleXMLElement($xml); dom_import_simplexml($simpleXml)->ownerDocument->encoding = 'UTF-8'; $simpleXml->foo = 'Ņ'; echo $simpleXml->asXML();
The trick here is to set the encoding in the underlying DOMDocument
, this is the only way I know for a SimpleXMLElement
(and DOMDocument
). Here the output:
<?xml version="1.0" encoding="UTF-8"?> <root><empty_tag/><foo>Ņ</foo></root>
You can see no more Ņ
entity but instead just Ņ
in Unicode (UTF-8 encoded). The XML declaration also shows now the encoding.
From your question I assume this is what you’re looking “for” LIBXML_NOENT
.
LIBXML_NOXMLDECL
The second one in the list. I never got it to work, it’s buggy and/or has some specific version requirements but honestly I don’t even know if/where to apply it intentionally.
You can either strip the first line (always “n
” terminated) which contains the XML Declaration from the output.
Or you can again related to the underlying DOMDocument
to output the document-element so it’s not the complete document and hence has no XML Declaration:
$dom = dom_import_simplexml($simpleXml)->ownerDocument; echo $dom->saveXML($dom->documentElement);
Output:
<root><empty_tag/><foo>Ņ</foo></root>
This is basically what is suggested in: remove xml version tag when a xml is created in php.
LIBXML_NOEMPTYTAG
The third and last one in the list. I could now quote from the PHP manual but this has been done elsewhere on site already but anyway, how to do this with a SimpleXMLElement
regardless the constant is not available?
One way would be to provide the option via DOMDocument
again:
$dom = dom_import_simplexml($simpleXml)->ownerDocument; echo $dom->saveXML($dom->documentElement, LIBXML_NOEMPTYTAG);
Output:
<root><empty_tag></empty_tag><foo>Ņ</foo></root>
Or for doing this “pure” SimpleXML, an empty text node into every empty element:
$xml = '<?xml version="1.0" encoding="UTF-8"?><root><empty_tag/><foo></foo></root>'; $simpleXml = new SimpleXMLElement($xml); $simpleXml->foo = 'Ņ'; foreach ($simpleXml->xpath('//*[not(*) and string() = ""]') as $empty) { $empty[0] = ''; } echo $simpleXml->asXML();
That is in the foreach
, to obtain all empty elements per the xpath query and then setting the text contents of it to an empty string which will insert a text-node in there if there ain’t (an empty) one yet. Outpupt:
<?xml version="1.0" encoding="UTF-8"?> <root><empty_tag></empty_tag><foo>Ņ</foo></root>
I hope this gives you the options you were looking for.