I have a string in the arabic language like:
على احمد يوسف
Now I need to cut this string and output it like:
...على احمد يو
I tried this function:
function short_name($str, $limit) { if ($limit < 3) { $limit = 3; } if (strlen($str) > $limit) { if (preg_match('/p{Arabic}/u', $str)) { return substr($str, 0, $limit - 3) . '...'; } else { return '...'.substr($str, 0, $limit - 3); } } else { return $str; } }
The problem is that sometimes it displays a symbol like this at the end of the string:
...�على احمد يو
Why does this happen?
Advertisement
Answer
The symbol displayed after the cut is the result of substr()
cutting in the middle of a character, resulting in an invalid character.
You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen()
and mb_substr()
.
You also need to make sure the internal encoding for those functions is set to UTF-8
. You can set this globally at the top of your script:
mb_internal_encoding('UTF-8');
Which leads to this:
strlen('على احمد يوسف')
returns 24, the size in octetsmb_strlen('على احمد يوسف')
returns 13, the size in characters
Note that mb_strlen('على احمد يوسف')
would also return 24 if the internal encoding was still set to the default ISO-8859-1
.