I’m trying to modify a 130mb+ XML file via PHP so it only shows the results where a child node is a specific value. I’m trying to filter this because of limitations via the software we’re using to import the XML into our website.
Example: (mockup data)
<Items> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>false</BrandDescr> </Item> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>true</BrandDescr> </Item> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>false</BrandDescr> </Item> </Items>
Desired result: I want to create a new XML file with only the records where the child “ShowOnWebsite” is true.
Problems I’ve run into Because the XML is so large simple solutions like using SimpleXML or loading the XML into the body and editing the nodes in there don’t work. Because they all read the entire file into memory which is too slow and usually fails.
I’ve also looked at prewk/xml-string-streamer (https://github.com/prewk/xml-string-streamer) which is great for streaming large XML files because it doesn’t place them in memory, although I can’t find any way to modify the XML via that solution. (Other online posts say you need to have the nodes in memory to edit them).
Anyone got an idea on how to tackle this problem?
Advertisement
Answer
Goal
Desired result: I want to create a new XML file with only the records where the child “ShowOnWebsite” is true.
Given
test.xml
<Items> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>false</ShowOnWebsite> </Item> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>true</ShowOnWebsite> </Item> <Item> <Barcode>...</Barcode> <BrandCode>...</BrandCode> <Title>...</Title> <Content>...</Content> <ShowOnWebsite>false</ShowOnWebsite> </Item> </Items>
Code
This is the implementation I wrote. The getItems
yields the childs without loading the xml at once into the memory.
function getItems($fileName) { if ($file = fopen($fileName, "r")) { $buffer = ""; $active = false; while(!feof($file)) { $line = fgets($file); $line = trim(str_replace(["r", "n"], "", $line)); if($line == "<Item>") { $buffer .= $line; $active = true; } elseif($line == "</Item>") { $buffer .= $line; $active = false; yield new SimpleXMLElement($buffer); $buffer = ""; } elseif($active == true) { $buffer .= $line; } } fclose($file); } } $output = new SimpleXMLElement('<?xml version="1.0" encoding="utf-8"?><Items></Items>'); foreach(getItems("test.xml") as $element) { if($element->ShowOnWebsite == "true") { $item = $output->addChild('Item'); $item->addChild('Barcode', (string) $element->Barcode); $item->addChild('BrandCode', (string) $element->BrandCode); $item->addChild('Title', (string) $element->Title); $item->addChild('Content', (string) $element->Content); $item->addChild('ShowOnWebsite', $element->ShowOnWebsite); } } $fileName = __DIR__ . "/test_" . rand(100, 999999) . ".xml"; $output->asXML($fileName);
Output
<?xml version="1.0" encoding="utf-8"?> <Items><Item><Barcode>...</Barcode><BrandCode>...</BrandCode><Title>...</Title><Content>...</Content><ShowOnWebsite>true</ShowOnWebsite></Item></Items>