I have a piece of code here which I need either assurance, or “no no no!” about in regards to if I’m thinking about this in the right or entirely wrong way.
This has to deal with cutting a variable of binary data at a specific spot, and also dealing with multi-byte overloaded functions. For example substr
is actually mb_substr
and strlen
is mb_strlen
etc.
Our server is set to UTF-8
internal encoding, and so theres this weird little thing I do to circumvent it for this binary data manipulation:
// $binary_data is the incoming variable with binary // $clip_size is generally 16, 32 or 64 etc $curenc = mb_internal_encoding();// this should be "UTF-8" mb_internal_encoding('ISO-8859-1');// change so mb_ overloading doesnt screw this up if (strlen($binary_data) >= $clip_size) { $first_hunk = substr($binary_data,0,$clip_size); $rest_of_it = substr($binary_data,$clip_size); } else { // skip since its shorter than expected } mb_internal_encoding($curenc);// put this back now
I can’t really show input and output results, since its binary data. But tests using the above appear to be working just fine and nothing is breaking…
However, parts of my brain are screaming “what are you doing… this can’t be the way to handle this”!
Notes:
- The binary data coming in, is a concatenation of those two parts to begin with.
- The first part’s size is always known (but changes).
- The second part’s size is entirely unknown.
- This is pretty darn close to encryption and stuffing the IV on front and ripping it off again (which oddly, I found some old code which does this same thing lol ugh).
So, I guess my question is:
- Is this actually fine to be doing?
- Or is there something super obvious I’m overlooking?
Advertisement
Answer
MY SOLUTION TO THE WORRY
I dislike answering my own questions… but I wanted to share what I have decided on nonetheless.
Although what I had, “worked”, I still wanted to change the hack-job-altering of the charset encoding. It was old code I admit, but for some reason, I never looked at hex2bin
bin2hex
for doing this. So I decided to change it to use those.
The resulting new code:
// $clip_size remains the same value for continuity later, // only spot-adjusted here... which is why the *2. $hex_data = bin2hex( $binary_data ); $first_hunk = hex2bin( substr($hex_data,0,($clip_size*2)) ); $rest_of_it = hex2bin( substr($hex_data,($clip_size*2)) ); if ( !empty($rest_of_it) ) { /* process the result for reasons */ }
Using the hex functions, turns the mess into something mb will not screw with either way. A 1 million bench loop, showed the process wasn’t anything to be worried about (and its safer to run in parallel to itself than the mb_encoding mangle method).
So I’m going with this. It sits better in my mind, and resolves my question for now… until I revisit this old code again in a few years and go “what was I thinking ?!”.