We have a bunch of surrogate pair (or 2-byte utf8?) characters such as ��
which is the prayer hands emojis stored as UTF8 as 2 characters. When rendered in a browser this string renders as two ??
example:
I need to convert those to the hands emjoi using php but I simply cannot find a combination of iconv, utf8_decode, html_entity_decode etc to pull it off.
This site converts the ��
properly:
http://www.convertstring.com/EncodeDecode/HtmlDecode
Paste in there the following string
Please join me in this prayer. ��❤️
You will notice the surragate pair (��
) converts to ????
This site is claiming to use HTMLDecode but I cannot find anything inside php to pull this off. I have tried: iconv html_entity_decode and a few public libraries.
I admit I am no expert when it comes to converting character encodies around!
Advertisement
Answer
I was not able to find a function to do this, but this works:
$str = "Please join me in this prayer. ��❤️"; $newStr = preg_replace_callback("/&#.....;&#.....;/", function($matches){return convertToEmoji($matches);}, $str); print_r($newStr); function convertToEmoji($matches){ $newStr = $matches[0]; $newStr = str_replace("&#", '', $newStr); $newStr = str_replace(";", '##', $newStr); $myEmoji = explode("##", $newStr); $newStr = dechex($myEmoji[0]) . dechex($myEmoji[1]); $newStr = hex2bin($newStr); return iconv("UTF-16BE", "UTF-8", $newStr); }