Skip to content
Advertisement

How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?

Let’s say (for simplicity’s sake) that I have a multibyte, UTF-8 encoded string variable with 3 letters (consisting of 4 bytes):

JavaScript

Since it’s UTF-8, the bytes’ hex values are (excluding the BOM):

JavaScript

As the $original variable is user-defined, I will need to hande two things:

  1. Get the exact number of bytes (not UTF-8 characters) used in the string, and
  2. A way to access each individual byte (not UTF-8 character).

I would tend to use strlen() to handle “1.”, and access the $original variable’s bytes with a simple `$original[$byteposition] like this:

JavaScript

This proves my initial idea is not working:

  1. var_dump shows 3 bytes
  2. printf fails too since “ord” only works on ASCII chars

How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?

What I am looking for is a binary-safe way to convert UTF-8 string(s) into byte-array(s).

Advertisement

Answer

you can get a bytearray by unpacking the utf8_encoded string $a:

JavaScript

used format C* for “unsigned char”

References

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement