Invalid characters in XML tag name

Question

We have a huge database where users can create custom fields. Every UTF-8 character is allowed in their name. Until a few weeks ago, when they export their data in XML, only invalid characters that ...

Accepted Answer

@Quentin suggest the better way. Using dynamic node names mean that you can not define an XSD/Schema, your XML files will be wellformed only. You will not be able to make full use of validators. So a <field name="..."/> is a better solution from a machine readability and maintenance point of view.However, NCNames (non-colonized names) allow for quite a lot characters. Here is what I implemented in my library for converting JSON. $nameStartChar defines letters and several Unicode ranges. $nameChar adds some more characters to that definition (like the digits).The first RegExp removes any character that is NOT a name char. The second removes any starting character that is NOT defined in $nameStartChar. If the result is empty it will return a default name.function normalizeString(string $string, string $default = '_'): string {    $nameStartChar =      'A-Z_a-z'.      '\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{2FF}\x{370}-\x{37D}'.      '\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}'.      '\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}'.      '\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}';    $nameChar =      $nameStartChar.      '\.\d\x{B7}\x{300}-\x{36F}\x{203F}-\x{2040}';    $result = preg_replace(      [        '([^'.$nameChar.'-]+)u',        '(^[^'.$nameStartChar.']+)u',      ],      '',      $string    );    return empty($result) ? $default : $result;} An qualified XML node name can consist of two NC names separated by &#8216;:&#8217;. The first part would be the namespace prefix. $examples = [  '123foo',   'foo123',   '  foo  ',   '  ',   'foo:bar',   'foo-bar'];foreach ($examples as $example) {    var_dump(normalizeString($example));}Output: string(3) "foo"string(6) "foo123"string(3) "foo"string(1) "_"string(6) "foobar"string(7) "foo-bar"

Advertisement

Answer