Skip to content
Advertisement

How can I detect a malformed UTF-8 string in PHP?

The iconv function sometimes gives me an error:

JavaScript

Is there a way to detect that there are illegal characters in a UTF-8 string before sending data to inconv()?

Advertisement

Answer

First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.

You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid string is given:

JavaScript

Another possibility is mb_check_encoding [PHP Manual]:

JavaScript

Another function you can use is mb_detect_encoding [PHP Manual]:

JavaScript

It’s important to set the strict parameter to true.

Additionally, iconv [PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv encounters such a sequence, it generates a notification; this behavior cannot be changed.)

JavaScript

You can use @ and check the length of the return string:

JavaScript

Check the examples on the iconv manual page as well.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement