Skip to content
Advertisement

Searching for a good Unicode-compatible alternative to the PHP ord() function

After quite a bit of searching and testing, the simplest method I’ve found for a Unicode-compatible alternative to the PHP ord() function is this:

$utf8Character = 'Ą';
list(, $ord) = unpack('N', mb_convert_encoding($utf8Character, 'UCS-4BE', 'UTF-8'));
echo $ord; # 260

I found this here. However, it has been mentioned that this method is rather slow. Does anyone know of a more efficient method which is nearly as simple? And what does UCS-4BE mean?

Advertisement

Answer

You might also be able to implement this function using iconv(), but the mb_convert_encoding method you’ve got looks reasonable to me. Just make sure that $utf8Character is a single character, not a long string, and it’ll perform reasonably well.

UCS-4BE is a Unicode encoding which stores each character as a 32-bit (4 byte) integer. This accounts for the “UCS-4”; the “BE” prefix indicates that the integers are stored in big-endian order. The reason for this encoding is that, unlike smaller encodings (like UTF-8 or UTF-16), it requires no surrogate pairs — each character is a fixed size.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement