Skip to content
Advertisement

PHP Split XML based on multiple nodes

I honestly tried to find a solution for php, but a lot of threads sound similar, but are not applicable for me or are for completely different languages.

I want to split an xml file based on nodes. Ideally multiple nodes, but of course one is enough and could be applied multiple times.

e.g. I want to split this by the tag <thingy> and <othernode>:

<root>
   <stuff />
   <thingy><othernode>one</othernode></thingy>
   <thingy><othernode>two</othernode></thingy>
   <thingy>
      <othernode>three</othernode>
      <othernode>four</othernode>
   </thingy>
   <some other data/>
</root>

Ideally I want to have 4 xmlstrings of type:

<root>
   <stuff />
   <thingy><othernode>CONTENT</othernode></thingy>
   <some other data/>
</root>

With CONTENT being one, two, three and four. Plottwist: CONTENT can also be a whole subtree. Of course it all also can be filled with various namespaces and tag prefixes (like <q1:node/>. Formatting is irrelevant for me.

  • I tried SimpleXml, but it lacks the possiblity to write into dom easily
  • I tried DomDocument, but all what I do seems to destroy some links/relation of parent/child nodes in some way.
  • I tried XmlReader/Writer, but that is extremely hard to maintain and combine (at least for me).

So far my best guess is something with DomDocument, node cloning and removing everything but one node?

Advertisement

Answer

Interesting question.

If I get it right, it is given that <othernode> is always a child of <thingy> and the split is for each <othernode> at the place of the first <thingy> in the original document.

DOMDocument appeared useful in this case, as it allows to easily move nodes around – including all its children.

Given the split on a node-list (from getElementsByTagName()):

echo "---n";
foreach ($split($doc->getElementsByTagName('othernode')) as $doc) {
    echo $doc->saveXML(), "---n";
}

When moving all <othernode> elements into a DOMDocumentFragement of its own while cleaning up <thingy> parent elements when emptied (unless the first anchor element) and then temporarily bring each of them back into the DOMDocument:

$split = static function (DOMNodeList $nodes): Generator {
    while (($element = $nodes->item(0)) && $element instanceof DOMElement) {
        $doc ??= $element->ownerDocument;
        $basin ??= $doc->createDocumentFragment();
        $anchor ??= $element->parentNode;
        [$parent] = [$element->parentNode, $basin->appendChild($element)];
        $parent->childElementCount || $parent === $anchor || $parent->parentNode->removeChild($parent);
    }

    if (empty($anchor)) {
        return;
    }

    assert(isset($basin, $doc));

    while ($element = $basin->childNodes->item(0)) {
        $element = $anchor->appendChild($element);
        yield $doc;
        $anchor->removeChild($element);
    }
};

This results in the following split:

---
<?xml version="1.0"?>
<root>
   <stuff/>
   <thingy><othernode>one</othernode></thingy>
   
   
   <some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
   <stuff/>
   <thingy><othernode>two</othernode></thingy>
   
   
   <some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
   <stuff/>
   <thingy><othernode>three</othernode></thingy>
   
   
   <some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
   <stuff/>
   <thingy><othernode>four</othernode></thingy>
   
   
   <some other="data"/>
</root>
---
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement