Skip to content
Advertisement

Counting special characters with PHP

I want to count the number of characters in a textfield on my website. The textfield accepts any type of input from a user, including ascii art and other special characters. If the user types in normal characters, I can use strlen($message) to return the value, but if the user uses special characters (such as  or ©), the count is incorrect.

Is there are simple way to count everything without having to do any heavy lifting?

Advertisement

Answer

If your input is UTF-8 encoded and you want to count Unicode graphemes, you can do this:

$count = preg_match_all('/X/u', $text);

Here is some explanation. Unicode graphemes are “characters” (Unicode codepoints), including the “combining marks” that can follow them.

mb_strlen($text, 'UTF-8') would count combining marks as separate characters (and strlen($text) would give you the total bytecount).

Since, judging by a comment of yours, your input could have some characters converted to their HTML entity equivalent, you should first do an html_entity_decode():

$count = preg_match_all('/X/u', html_entity_decode($text, ENT_QUOTES, 'UTF-8'));

UPDATE

The intl PECL extension now provides grapheme_strlen() and other grapheme_*() functions (but only if you have the intl PECL extension installed, of course).

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement