my MySQL database is set to utf8_unicode_ci and I have $pdo->exec(‘SET NAMES “utf8″‘) as part of the following php code yet when I echo text from the query a hyphen – looks likes this –. What am I doing wrong, why is the hyphen not displaying correctly?
<?php try { $pdo = new PDO('mysql:host=localhost;dbname=danville_tpf', 'danville_dan', 'password'); $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); $pdo->exec('SET NAMES "utf8"'); } catch (PDOException $e) { $output = 'Unable to connect to the database server.'; include 'output.html.php'; exit(); } $output = 'Theme Park Database initialized'; //include 'output.html.php';// try { $park_id = $_GET['park_id']; $query = "SELECT * FROM tpf_parks WHERE park_id = $park_id"; $result = $pdo->query($query); } catch (PDOException $e) { $output = 'Unable to connect to the database server.'; //include 'output.html.php';// } $output = 'Sucessfully pulled park'; //include 'output.html.php';// foreach ($result as $row) { $parkdetails[] = array( 'name' => $row['name'], 'blurb' => $row['blurb'], 'website' => $row['website'], 'address' => $row['address'], 'logo' => $row['logo'] ); } ?>
Please help.
Advertisement
Answer
–
is common mojibake for an en dash (–
), which is a different character from a hyphen.
It is the result of taking the UTF-8–encoded form of the dash (0xe2 0x80 0x93
) and incorrectly assuming that it is actually encoded using Windows-1252.
Interpreting those three bytes as Windows-1252: 0xe2
, 0x80
and 0x93
separately represent â
, €
and “
.
Assuming the offending character is in the blurb
field, if you query SELECT HEX(blurb) FROM tpf_parks
(with a suitable WHERE clause), you will see the hex encoding of the offending bytes.
If you see E28093
in there, then the database value is correctly encoded as UTF-8 and there will be a character encoding mismatch in your client or server configuration (e.g. you’re reading it from the DB or displaying it to the browser with mismatched encodings).
If, however, you see C3A2E282ACE2809C
, then the character has already been encoded incorrectly in the database — i.e. interpreted incorrectly, then saved as the UTF-8 representation of those 3 characters. If this is the case you’ll need to update the data to fix the issue. You could do this using iconv
:
$fixedData = iconv("utf-8", "windows-1252", $badData);
This will convert the doubly-converted bytes back to the UTF-8 encoding.