Skip to content
Advertisement

fgetcsv encoding issue (PHP)

I am being sent a csv file that is tab delimited. Here is a sample of what I see:

Invoice: Invoice Date   Account: Name   Bill To: First Name Bill To: Last Name  Bill To: Work Email Rate Plan Charge: Name  Subscription: Device Serial Number
2021-03-10  Test Company    Wally   Kolcz   test@test.com   Sample plan A0H1234567890A

I wrote a script to open, read and loop over the values but I get weird stuff after:

if (($handle = fopen($user_file, "r")) !== FALSE) {
            while (($data = fgetcsv($handle, 1000, "t")) !== FALSE) {
                if($line >1 && isset($data[1])){
                    
                    $user = [
                        'EmailAddress' => $data[4],
                        'Name' => $data[2].' '.$data[3],
                    ];
                }

                $line++;
            }
            fclose($handle);
        }

Here is what I get when I dump the first line.

array:7 [▼
  0 => b"ÿþIx00nx00vx00ox00ix00cx00ex00:x00 x00Ix00nx00vx00ox00ix00cx00ex00 x00Dx00ax00tx00ex00"
  1 => "x00Ax00cx00cx00ox00ux00nx00tx00:x00 x00Nx00ax00mx00ex00"
  2 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Fx00ix00rx00sx00tx00 x00Nx00ax00mx00ex00"
  3 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Lx00ax00sx00tx00 x00Nx00ax00mx00ex00"
  4 => "x00Bx00ix00lx00lx00 x00Tx00ox00:x00 x00Wx00ox00rx00kx00 x00Ex00mx00ax00ix00lx00"
  5 => "x00Rx00ax00tx00ex00 x00Px00lx00ax00nx00 x00Cx00hx00ax00rx00gx00ex00:x00 x00Nx00ax00mx00ex00"
  6 => "x00Sx00ux00bx00sx00cx00rx00ix00px00tx00ix00ox00nx00:x00 x00Dx00ex00vx00ix00cx00ex00 x00Sx00ex00rx00ix00ax00lx00 x00Nx00ux00mx00bx00ex00rx00 ◀"
]

I tried adding:

header('Content-Type: text/html; charset=UTF-8');
$data = array_map("utf8_encode", $data);
setlocale(LC_ALL, 'en_US.UTF-8');

And when I dump mb_detect_encoding($data[2]), I get ‘ASCII’…

Any way to fix this so I don’t have to manually update the file each time I receive it? Thanks!

Advertisement

Answer

Looks like the file is in UTF-16 (every other byte is null).

You probably need to convert the whole file with something like mb_convert_encoding($data, "UTF-8", "UTF-16");

But you can’t really use fgetcsv() in that case…

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement