How to iterate a UTF-8 string character by character using indexing?
When you access a UTF-8 string with the bracket operator $str[0]
the utf-encoded character consists of 2 or more elements.
For example:
$str = "Kąt"; $str[0] = "K"; $str[1] = "�"; $str[2] = "�"; $str[3] = "t";
but I would like to have:
$str[0] = "K"; $str[1] = "ą"; $str[2] = "t";
It is possible with mb_substr
but this is extremely slow, ie.
mb_substr($str, 0, 1) = "K" mb_substr($str, 1, 1) = "ą" mb_substr($str, 2, 1) = "t"
Is there another way to interate the string character by character without using mb_substr
?
Advertisement
Answer
Use preg_split. With “u” modifier it supports UTF-8 unicode.
$chrArray = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);