Skip to content
Advertisement

What is the best way to split a string into an array of Unicode characters in PHP?

In PHP, what is the best way to split a string into an array of Unicode characters? If the input is not necessarily UTF-8?

I want to know whether the set of Unicode characters in an input string is a subset of another set of Unicode characters.

Why not run straight for the mb_ family of functions, as the first couple of answers didn’t?

Advertisement

Answer

You could use the ‘u’ modifier with PCRE regex ; see Pattern Modifiers (quoting) :

u (PCRE8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

For instance, considering this code :

JavaScript

You’ll get an unusable result:

JavaScript

But, with this code :

JavaScript

(Notice the ‘u’ at the end of the regex)

You get what you want :

JavaScript

Hope this helps 🙂

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement