Skip to content
Advertisement

PHP: Encode UTF8-Characters to html entities

I want to encode normal characters to html-entities like

JavaScript

but

JavaScript

doesn’t work. It outputs the normal charaters (a A b B) in the html source code instead of the html-entities.

How can I convert them?

Advertisement

Answer

You can build a function for this fairly easily using mb_ord or IntlChar::ord, either of which will give you the numeric value for a Unicode Code Point.

You can then convert that to a hexadecimal string using base_convert, and add the ‘&#x’ and ‘;’ around it to give an HTML entity:

JavaScript

You then need to run that for each code point in your UTF-8 string. It is not enough to loop over the string using something like substr, because PHP’s string functions work with individual bytes, and each UTF-8 code point may be multiple bytes.

One approach would be to use a regular expression replacement with a pattern of /./u:

  • The . matches each single “character”
  • The /u modifier turns on Unicode mode, so that each “character” matched by the . is a whole code point

You can then run the above make_entity function for each match (i.e. each code point) with preg_replace_callback.


Since preg_replace_callback will pass your callback an array of matches, not just a string, you can make an arrow function which takes the array and passes element 0 to the real function:

JavaScript

So putting it together, you have this:

JavaScript

Arrow functions were introduced in PHP 7.4, so if you’re stuck on an older version, you can write the same thing as a regular anonymous function:

JavaScript

Or of course, just a regular named function (or a method on a class or object; see the “callable” page in the manual for the different syntax options):

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement