Skip to content
Advertisement

Transform complex and variable xml

I’ve a complex XML that I want to transform in HTML. Some tags need to be replaced in html tags.

The XML is this:

<root>
<div>
    <p>
        <em>bol text</em>, some normale text
    </p>
</div>
<list>
    <listitem>
        normal text inside list <em>bold inside list</em>
    </listitem>
    <listitem>
        another text in list...
    </listitem>
</list>
<p>
    A sample paragraph
</p>

The text inside the element is variable, which means that the other xml that I parse can completely change.

The output I want is this (for this scenario):

<root>
    <div>
        <p>
            <strong>bol text</strong>, some normale text
        </p>
    </div>
    <ul>
        <li>
            normal text inside list <strong>bold inside list</strong>
        </li>
        <li>
            another text in list...
        </li>
    </ul>
    <p>
        A sample paragraph
    </p>
</root>

I make a recursive function for parse any single node of xml and replace it in HTML tag (but doesn’t work):

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->load('section.xml');
echo $doc->saveHTML();

function printHtml(DOMNode $node)
{
    if ($node->hasChildNodes())
    {
        foreach ($node->childNodes as $child)
        {
            printHtml($child);
        }
    }

    if ($node->nodeName == 'em')
    {
        $newNode = $node->ownerDocument->createElement('strong', $node->nodeValue);
        $node->parentNode->replaceChild($newNode, $node);
    }

    if ($node->nodeName == 'listitem')
    {
        $newNode = $node->ownerDocument->createElement('li', $node->nodeValue);
        $node->parentNode->replaceChild($newNode, $node);
    }
}

Can anyone help me?

This is an example of a complete xml:

<root>
    <div>
        <p>
            <em>bol text</em>, some normale text
        </p>
    </div>
    <list>
        <listitem>
            normal text inside list <em>bold inside list</em>
        </listitem>
        <listitem>
            another text in list...
        </listitem>
    </list>
    <media>
        <info isVisible="false">
            <title>
                <p>Image title <em>in bold</em> not in bold</p>
            </title>
        </info>
        <file isVisible="true">
            <href>
                "path/to/file.jpg"
            </href>
        </file>
    </media>
    <p>
        A sample paragraph
    </p>
</root>

Which has to be transformed into:

<root>
    <div>
        <p>
            <strong>bol text</strong>, some normale text
        </p>
    </div>
    <ul>
        <li>
            normal text inside list <em>bold inside list</em>
        </li>
        <li>
            another text in list...
        </li>
    </ul>
    <!-- the media tag can be presented in two mode: with title visible, and title hidden -->
    <!-- this is the case when the title is hidden -->
    <img src="path/to/file.jpg" />
    
    <!-- this is the case when the title is visible -->
    <!-- the info tag (inside media tag) has an attribute isVisible="false" which means it doesn't have to be shown. -->
    <!-- if the info tag has visible=true, the media tag must be translated into
     <div>
        <img src="path/to/file.jpg" />
        <p>Image title <strong>in bold</strong> not in bold</p>
     <div>
     -->
    <p>
        A sample paragraph
    </p>
</root>

Advertisement

Answer

There’s a language specially designed for this task: it’s called XSLT, and you can easily express your desired transformation in XSLT and invoke it from your PHP program. There’s a learning curve, of course, but it’s a much better solution than writing low-level DOM code.

In XSLT you write a set of template rules saying how individual elements should be handled. Many elements in your example are copied through unchanged, so you can start with a default rule that does this:

<xsl:template match="*">
  <xsl:copy><xsl:apply-templates/></xsl:copy>
</xsl:template>

The “match” part says what part of the input you are matching; the body of the rule says what output to produce. The xsl:apply-templates does a recursive descent to process the children of the current element.

Some of your elements are simply renamed, for example

<xsl:template match="listitem">
 <li><xsl:apply-templates/></li>
</xsl:template>

Some of the rules are a little bit more complex, but still easily expressed:

<xsl:tempate match="media/file[@isVisible='true']">
  <img src="{href}"/>
</xsl:template>

I hope you agree that this declarative rule-based approach is much clearer than your procedural code; it’s also much easier for someone else to change the rules in six months’ time.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement