I have a string in the arabic language like:
JavaScript
x
على احمد يوسف
Now I need to cut this string and output it like:
JavaScript
على احمد يو
I tried this function:
JavaScript
function short_name($str, $limit) {
if ($limit < 3) {
$limit = 3;
}
if (strlen($str) > $limit) {
if (preg_match('/p{Arabic}/u', $str)) {
return substr($str, 0, $limit - 3) . '...';
}
else {
return '...'.substr($str, 0, $limit - 3);
}
}
else {
return $str;
}
}
The problem is that sometimes it displays a symbol like this at the end of the string:
JavaScript
�على احمد يو
Why does this happen?
Advertisement
Answer
The symbol displayed after the cut is the result of substr()
cutting in the middle of a character, resulting in an invalid character.
You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen()
and mb_substr()
.
You also need to make sure the internal encoding for those functions is set to UTF-8
. You can set this globally at the top of your script:
JavaScript
mb_internal_encoding('UTF-8');
Which leads to this:
strlen('على احمد يوسف')
returns 24, the size in octetsmb_strlen('على احمد يوسف')
returns 13, the size in characters
Note that mb_strlen('على احمد يوسف')
would also return 24 if the internal encoding was still set to the default ISO-8859-1
.