The iconv function sometimes gives me an error:
Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in [...]
Is there a way to detect that there are illegal characters in a UTF-8 string before sending data to inconv()?
Advertisement
Answer
First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.
You can make use of the UTF-8 validity check that is available in preg_match
[PHP Manual] since PHP 4.3.5. It will return 0
(with no additional information) if an invalid string is given:
$isUTF8 = preg_match('//u', $string);
Another possibility is mb_check_encoding
[PHP Manual]:
$validUTF8 = mb_check_encoding($string, 'UTF-8');
Another function you can use is mb_detect_encoding
[PHP Manual]:
$validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));
It’s important to set the strict
parameter to true
.
Additionally, iconv
[PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv
encounters such a sequence, it generates a notification; this behavior cannot be changed.)
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL; echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;
You can use @
and check the length of the return string:
strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));
Check the examples on the iconv
manual page as well.