Skip to content
Advertisement

PHP: Encode UTF8-Characters to html entities

I want to encode normal characters to html-entities like

a => a
A => A
b => b
B => B

but

echo htmlentities("a");

doesn’t work. It outputs the normal charaters (a A b B) in the html source code instead of the html-entities.

How can I convert them?

Advertisement

Answer

You can build a function for this fairly easily using mb_ord or IntlChar::ord, either of which will give you the numeric value for a Unicode Code Point.

You can then convert that to a hexadecimal string using base_convert, and add the ‘&#x’ and ‘;’ around it to give an HTML entity:

function make_entity(string $char) {
    $codePoint = mb_ord($char, 'UTF-8'); // or IntlChar::ord($char); 
    $hex = base_convert($codePoint, 10, 16);
    return '&#x' . $hex . ';';
}
echo make_entity('a');
echo make_entity('€');
echo make_entity('🐘');

You then need to run that for each code point in your UTF-8 string. It is not enough to loop over the string using something like substr, because PHP’s string functions work with individual bytes, and each UTF-8 code point may be multiple bytes.

One approach would be to use a regular expression replacement with a pattern of /./u:

  • The . matches each single “character”
  • The /u modifier turns on Unicode mode, so that each “character” matched by the . is a whole code point

You can then run the above make_entity function for each match (i.e. each code point) with preg_replace_callback.


Since preg_replace_callback will pass your callback an array of matches, not just a string, you can make an arrow function which takes the array and passes element 0 to the real function:

$callback = fn($matches) => make_entity($matches[0]);

So putting it together, you have this:

echo preg_replace_callback('/./u', fn($m) => make_entity($m[0]), 'a€ðŸ˜');

Arrow functions were introduced in PHP 7.4, so if you’re stuck on an older version, you can write the same thing as a regular anonymous function:

echo preg_replace_callback('/./u', function($m) { return make_entity($m[0]) }, 'a€ðŸ˜');

Or of course, just a regular named function (or a method on a class or object; see the “callable” page in the manual for the different syntax options):

function make_entity_from_array_item(array $matches) {
    return make_entity($matches[0]);
}
echo preg_replace_callback('/./u', 'make_entity_from_array_item', 'a€ðŸ˜');
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement