I am being sent a csv file that is tab delimited. Here is a sample of what I see:
Invoice: Invoice Date Account: Name Bill To: First Name Bill To: Last Name Bill To: Work Email Rate Plan Charge: Name Subscription: Device Serial Number 2021-03-10 Test Company Wally Kolcz test@test.com Sample plan A0H1234567890A
I wrote a script to open, read and loop over the values but I get weird stuff after:
if (($handle = fopen($user_file, "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, "t")) !== FALSE) { if($line >1 && isset($data[1])){ $user = [ 'EmailAddress' => $data[4], 'Name' => $data[2].' '.$data[3], ]; } $line++; } fclose($handle); }
Here is what I get when I dump the first line.
array:7 [▼ 0 => b"ÿþIx00nx00vx00ox00ix00cx00ex00:x00 x00Ix00nx00vx00ox00ix00cx00ex00 x00Dx00ax00tx00ex00" 1 => "x00Ax00cx00cx00ox00ux00nx00tx00:x00 x00Nx00ax00mx00ex00" 2 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Fx00ix00rx00sx00tx00 x00Nx00ax00mx00ex00" 3 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Lx00ax00sx00tx00 x00Nx00ax00mx00ex00" 4 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Wx00ox00rx00kx00 x00Ex00mx00ax00ix00lx00" 5 => "x00Rx00ax00tx00ex00 x00Px00lx00ax00nx00 x00Cx00hx00ax00rx00gx00ex00:x00 x00Nx00ax00mx00ex00" 6 => "x00Sx00ux00bx00sx00cx00rx00ix00px00tx00ix00ox00nx00:x00 x00Dx00ex00vx00ix00cx00ex00 x00Sx00ex00rx00ix00ax00lx00 x00Nx00ux00mx00bx00ex00rx00 ◀" ]
I tried adding:
header('Content-Type: text/html; charset=UTF-8'); $data = array_map("utf8_encode", $data); setlocale(LC_ALL, 'en_US.UTF-8');
And when I dump mb_detect_encoding($data[2])
, I get ‘ASCII’…
Any way to fix this so I don’t have to manually update the file each time I receive it? Thanks!
Advertisement
Answer
Looks like the file is in UTF-16 (every other byte is null).
You probably need to convert the whole file with something like mb_convert_encoding($data, "UTF-8", "UTF-16");
But you can’t really use fgetcsv() in that case…