Skip to content
Advertisement

How to edit large XML files in PHP based on a record in the XML Node

I’m trying to modify a 130mb+ XML file via PHP so it only shows the results where a child node is a specific value. I’m trying to filter this because of limitations via the software we’re using to import the XML into our website.

Example: (mockup data)

<Items>
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</BrandDescr>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>true</BrandDescr>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</BrandDescr>
</Item>
</Items>

Desired result: I want to create a new XML file with only the records where the child “ShowOnWebsite” is true.

Problems I’ve run into Because the XML is so large simple solutions like using SimpleXML or loading the XML into the body and editing the nodes in there don’t work. Because they all read the entire file into memory which is too slow and usually fails.

I’ve also looked at prewk/xml-string-streamer (https://github.com/prewk/xml-string-streamer) which is great for streaming large XML files because it doesn’t place them in memory, although I can’t find any way to modify the XML via that solution. (Other online posts say you need to have the nodes in memory to edit them).

Anyone got an idea on how to tackle this problem?

Advertisement

Answer

Goal

Desired result: I want to create a new XML file with only the records where the child “ShowOnWebsite” is true.

Given

test.xml

<Items>
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</ShowOnWebsite>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>true</ShowOnWebsite>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</ShowOnWebsite>
</Item>
</Items>

Code

This is the implementation I wrote. The getItems yields the childs without loading the xml at once into the memory.

function getItems($fileName) {
    if ($file = fopen($fileName, "r")) {
        $buffer = "";
        $active = false;
        while(!feof($file)) {
            $line = fgets($file);
            $line = trim(str_replace(["r", "n"], "", $line));
            if($line == "<Item>") {
                $buffer .= $line;
                $active = true;
            } elseif($line == "</Item>") {
                $buffer .= $line;
                $active = false;
                yield new SimpleXMLElement($buffer);
                $buffer = "";
            } elseif($active == true) {
                $buffer .= $line;
            }
        }
        fclose($file);
    }   
}

$output = new SimpleXMLElement('<?xml version="1.0" encoding="utf-8"?><Items></Items>');
foreach(getItems("test.xml") as $element)
{
    if($element->ShowOnWebsite == "true") {
        $item = $output->addChild('Item');
        $item->addChild('Barcode', (string) $element->Barcode);
        $item->addChild('BrandCode', (string) $element->BrandCode);
        $item->addChild('Title', (string) $element->Title);
        $item->addChild('Content', (string) $element->Content);
        $item->addChild('ShowOnWebsite', $element->ShowOnWebsite);
    }
}

$fileName = __DIR__ . "/test_" . rand(100, 999999) . ".xml";
$output->asXML($fileName);

Output

<?xml version="1.0" encoding="utf-8"?>
<Items><Item><Barcode>...</Barcode><BrandCode>...</BrandCode><Title>...</Title><Content>...</Content><ShowOnWebsite>true</ShowOnWebsite></Item></Items>
Advertisement