Skip to content
Advertisement

utf8 not showing hyphens correctly in echoed text

my MySQL database is set to utf8_unicode_ci and I have $pdo->exec(‘SET NAMES “utf8″‘) as part of the following php code yet when I echo text from the query a hyphen – looks likes this –. What am I doing wrong, why is the hyphen not displaying correctly?

<?php    
    try {
        $pdo = new PDO('mysql:host=localhost;dbname=danville_tpf', 'danville_dan', 'password');
        $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
        $pdo->exec('SET NAMES "utf8"');
    } catch (PDOException $e) {
        $output = 'Unable to connect to the database server.';
        include 'output.html.php';
        exit();
    }

    $output = 'Theme Park Database initialized';
    //include 'output.html.php';//

    try {
        $park_id = $_GET['park_id'];
        $query = "SELECT * FROM tpf_parks WHERE park_id = $park_id";
        $result = $pdo->query($query);
    } catch (PDOException $e) {
        $output = 'Unable to connect to the database server.';
        //include 'output.html.php';//
    }

    $output = 'Sucessfully pulled park';
    //include 'output.html.php';//

    foreach ($result as $row) {
        $parkdetails[] = array(
            'name' => $row['name'],
            'blurb' => $row['blurb'],
            'website' => $row['website'],
            'address' => $row['address'],
            'logo' => $row['logo']
        );    
    }
?>

Please help.

Advertisement

Answer

– is common mojibake for an en dash (), which is a different character from a hyphen.

It is the result of taking the UTF-8–encoded form of the dash (0xe2 0x80 0x93) and incorrectly assuming that it is actually encoded using Windows-1252.

Interpreting those three bytes as Windows-1252: 0xe2, 0x80 and 0x93 separately represent â, and .

Assuming the offending character is in the blurb field, if you query SELECT HEX(blurb) FROM tpf_parks (with a suitable WHERE clause), you will see the hex encoding of the offending bytes.

If you see E28093 in there, then the database value is correctly encoded as UTF-8 and there will be a character encoding mismatch in your client or server configuration (e.g. you’re reading it from the DB or displaying it to the browser with mismatched encodings).

If, however, you see C3A2E282ACE2809C, then the character has already been encoded incorrectly in the database — i.e. interpreted incorrectly, then saved as the UTF-8 representation of those 3 characters. If this is the case you’ll need to update the data to fix the issue. You could do this using iconv:

$fixedData = iconv("utf-8", "windows-1252", $badData);

This will convert the doubly-converted bytes back to the UTF-8 encoding.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement