I have UTF-8 strings such as those below:
21st century
Other languages
General collections
Ancient languages
Medieval languages
Several authors (Two or more languages)
As you can see, the strings contain alphanumeric characters as well leading and trailing spaces.
I’d like to use PHP to retrieve the number of leading spaces (not trailing spaces) in each string. Note that the spaces might be non-standard ASCII spaces. I tried using:
var_dump(mb_ord($space_char, "UTF-8"));
where the $space_char
contains a sample space character I copied from one of the above strings, and I got 160 rather than 32.
I have tried:
strspn($string,$cmask); // $cmask contains a string with two space characters with 160 and 32 as their Unicode code points.
but I get a very unpredictable value.
The values should be:
(1) 12 (2) 6 (3) 9 (4) 9 (5) 9 (6) 12
What am I doing wrong?
Advertisement
Answer
I would go the regular expression route:
<?php function count_leading_spaces($str) { // p{Zs} will match a whitespace character that is invisible, // but does take up space if (mb_ereg('^p{Zs}+', $str, $regs) === false) return 0; return mb_strlen($regs[0]); } $samples = [ ' 21st century ', ' Other languages ', ' General collections ', ' Ancient languages ', ' Medieval languages ', ' Several authors (Two or more languages) ', ]; foreach ($samples as $i => $sample) { printf("(%d) %dn", $i + 1, count_leading_spaces($sample)); }
Output:
(1) 12 (2) 6 (3) 9 (4) 9 (5) 9 (6) 12