Skip to content
Advertisement

Cut an arabic string

I have a string in the arabic language like:

على احمد يوسف

Now I need to cut this string and output it like:

...على احمد يو

I tried this function:

function short_name($str, $limit) {
    if ($limit < 3) {
        $limit = 3;
    }

    if (strlen($str) > $limit) {
        if (preg_match('/p{Arabic}/u', $str)) {
            return substr($str, 0, $limit - 3) . '...';
        }
        else {
            return '...'.substr($str, 0, $limit - 3);
        }
    }
    else {
        return $str;
    }
}

The problem is that sometimes it displays a symbol like this at the end of the string:

...�على احمد يو

Why does this happen?

Advertisement

Answer

The symbol displayed after the cut is the result of substr() cutting in the middle of a character, resulting in an invalid character.

You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen() and mb_substr().

You also need to make sure the internal encoding for those functions is set to UTF-8. You can set this globally at the top of your script:

mb_internal_encoding('UTF-8');

Which leads to this:

  • strlen('على احمد يوسف') returns 24, the size in octets
  • mb_strlen('على احمد يوسف') returns 13, the size in characters

Note that mb_strlen('على احمد يوسف') would also return 24 if the internal encoding was still set to the default ISO-8859-1.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement