I want to convert a XML to a CSV. That’s OK, but in some itens, I have extra fields or less fields than other itens.
A example of my feed is:
EDIT: The top of the XML is:
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0"> <channel> <item>
Item with less fields:
<item> <title> <![CDATA[ Resident Evil Revelations 2: Raid Mode: Throwback Map Pack ]]> </title> <link> https://www.nuuvem.com/item/resident-evil-revelations-2-raid-mode-throwback-map-pack </link> <description> <![CDATA[ Novas missões do modo raide! 4 mapas nostálgicos de locais icônicos, como o Queen Zenobia, do Resident Evil Revelations. 3 níveis de dificuldade oferecem um total de 12 novas missões. ]]> </description> <g:availability>out of stock</g:availability> <g:price currency="BRL">9.99</g:price> <g:image_link> http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/5584854f69702d7235000025/boxshots/j6qaxbrhowfkijd5zdg8.jpg </g:image_link> <g:product_type> <![CDATA[ Action ]]> </g:product_type> <g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category> <g:condition>new</g:condition> <g:identifier_exists>FALSE</g:identifier_exists> <g:id>11985</g:id> </item>
Item with most common fields:
<item> <title> <![CDATA[ Tom Clancys Rainbow Six - SIEGE: Gemstone Bundle ]]> </title> <link> https://www.nuuvem.com/bundle/tom-clancy-s-rainbow-six-siege-gemstone-bundle </link> <description> <![CDATA[ ]]> </description> <g:availability>in stock</g:availability> <g:price currency="BRL">38.99</g:price> <g:image_link> http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/573ded74f372803be9006b35/boxshots/l8ypqwhq48jzbxogypeh.jpg </g:image_link> <g:product_type> <![CDATA[ Bundle ]]> </g:product_type> <g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category> <g:condition>new</g:condition> <g:identifier_exists>FALSE</g:identifier_exists> <g:id>12705</g:id> </item>
Item with more fields:
<item> <title> <![CDATA[ Far Cry 4 - Gold Edition ]]> </title> <link>https://www.nuuvem.com/item/far-cry-4-gold-edition</link> <description> <![CDATA[ You are a gun for hire, trapped in a war-torn African state, stricken with malaria and forced to make deals with corrupt warlords on both sides of the conflict in order to make this country your home. You must identify and exploit your enemies' weaknesses, neutralizing their superior numbers and firepower. ]]> </description> <g:availability>in stock</g:availability> <g:price currency="BRL">129.99</g:price> <g:sale_price currency="BRL">64.99</g:sale_price> <g:sale_price_effective_date> 2017-01-26T02:00:00+00:00/2017-01-31T01:59:00+00:00 </g:sale_price_effective_date> <g:image_link> http://dskhvldhwok3h.cloudfront.net/image/upload/t_boxshot_big/v1/products/557dbc5369702d0a9c57e600/boxshots/ld6c69odlluoerzmwyga.jpg </g:image_link> <g:product_type> <![CDATA[ Action ]]> </g:product_type> <g:google_product_category>Software > Video Game Software > Computer Games</g:google_product_category> <g:condition>new</g:condition> <g:identifier_exists>FALSE</g:identifier_exists> <g:id>2246</g:id> </item>
I’m using a script to do the conversion, it works, but it gets wrong when the item have more or less fields than the most common type of item.
$filexml='file.xml'; if (file_exists($filexml)) { $xml = simplexml_load_file($filexml); $i = 1; // Position counter $values = []; // PHP array // Writing column headers $columns = array('title', 'link', 'description', 'availability', 'price', 'image_link', 'product_type', 'google_product_category', 'condition', 'identifier_exists', 'id'); $fs = fopen('nuuvem-merchant.csv', 'w'); fputcsv($fs, $columns); fclose($fs); // Iterate through each <item> node $node = $xml->xpath('//item'); foreach ($node as $n) { // Iterate through each child of <item> node $child = $xml->xpath('//item['.$i.']/*'); foreach ($child as $value) { $values[] = $value; } // Write to CSV files (appending to column headers) $fs = fopen('nuuvem-merchant.csv', 'a'); fputcsv($fs, $values); fclose($fs); $values = []; // Clean out array for next <item> (i.e., row) $i++; // Move to next <item> (i.e., node position) } }
With this script, I get always a sequential order of the fields of each item. Because of that, the data doesn’t match the header of the column many times, and some items with more fields don’t have header because of it.
Sorry, I’m not a PHP developer, and it is hard to me to solve this. I tried to search here, but didn’t find a problem like mine.
Thanks
Advertisement
Answer
Simply iterate off the columns
array you use for the CSV headers where you pass column name into the XPath
expression not all item’s children with /*
. To select the first item of returned XPath array the [0]
index is used and to remove whitespace, trim()
is used.
Also, you will need to register the namespace prefix g
to access those elements (hence why namespaces are important to always include in posted XML snippets):
$filexml = 'GoogleProductFeed.xml'; $xml = simplexml_load_file($filexml); $xml->registerXPathNamespace('g', 'http://base.google.com/ns/1.0'); if (file_exists($filexml)) { $xml = simplexml_load_file($filexml); $i = 1; // Position counter $values = []; // PHP array // Writing column headers $columns = array('title', 'link', 'description', 'g:availability', 'g:price', 'g:image_link', 'g:product_type', 'g:google_product_category', 'g:condition', 'g:identifier_exists', 'g:id'); $fs = fopen('GoogleProductFeed.csv', 'w'); fputcsv($fs, $columns); fclose($fs); // Iterate through each <item> node $node = $xml->xpath('//item'); foreach ($node as $n) { // Iterate through each child of <item> node foreach ($columns as $col) { if (count($xml->xpath('//item['.$i.']/'.$col)) > 0) { $values[] = trim($xml->xpath('//item['.$i.']/'.$col)[0]); } else { $values[] = ''; } } // Write to CSV files (appending to column headers) $fs = fopen('GoogleProductFeed.csv', 'a'); fputcsv($fs, $values); fclose($fs); $values = []; // Clean out array for next <item> (i.e., row) $i++; // Move to next <item> (i.e., node position) } }
Do note column headers will have the g:
prefix for those applicable columns. Maybe use two nearly same arrays one for headers and other for XPath calls.