How can I clean my form inputs for characters like emojis? For example, when I post a paragraph in a text area with an emoji like this ????, a Record does not gets inserted to database. I can not change my database table to utf8mb. It is utf8 right now. I have tried the following functions but none of them works.
$str = iconv('UTF-8', 'UTF-8//IGNORE', $str);
$str = utf8_encode($str);
$str = mb_convert_encoding ($str, "UTF-8");
$str = htmlspecialchars ($str);
$str = htmlspecialchars ($str, ENT_SUBSTITUTE, 'UTF-8');
$str = htmlspecialchars ($str, ENT_SUBSTITUTE);
json_encode($str) does change emoji to something like “UXXXX” but it wraps all inputs in double-quotes and I have to decode it for every input.
To be clear if someone inputs “hello world ????” I want to save one of the following to the database: hello world or hello world 🙂 or hello world 🙂
Advertisement
Answer
You can use iconv: the UCS2 character set supports the only the basic multilingual plane — just like mysql’s version of “utf8” — so doing a roundtrip through UCS2 will drop all emojis while preserving most characters from most modern languages.
$s = "hello world ????";
$s = iconv("ucs2", "utf8", iconv("utf8", "ucs2//IGNORE", $s))
var_dump($s); // string(12) "hello world "