I am taking data in an xml feed from readability and inserting it into a database and then outputting it. The charset of the xml feed is UTF-8
, my html page headers are also UTF-8
. I even saved the code through my text editor as UTF-8
and also my DB is set to utf8_unicode_ci
. I can’t figure out why this is happening.
Code:
$xml = simplexml_load_file( "http://readability.com/christopherburton/latest/feed" ); $json = json_encode( $xml ); $array = json_decode( $json,TRUE ); $items = $array['channel']['item']; $DB = new mysqli('localhost', 'secret', 'secret', 'secret' ); if( $DB->connect_errno ){ print "failed to connect to DB: {$DB->connect_error}"; exit( 1 ); } $match = "#^(?:[^?]*?url=)(https?://)(?:m(?:obile)?.)?(.*)$#ui"; $replace = '$1$2'; foreach( $items as $item ){ $title = $item['title']; $url = preg_replace( $match,$replace,$item['link'] ); $title_url[] = array( $title,$url ); $sql_values[] = "('{$DB->real_escape_string( $title )}','{$DB->real_escape_string( $url )}')"; } $SQL = "INSERT IGNORE INTO `read`(`title`,`url`) VALUESn ".implode( "n,",array_reverse( $sql_values ) ); if( $DB->query( $SQL ) ){ } else { print "failed to INSERT: [{$DB->errno}] {$DB->error}"; } $DB->set_charset('utf8');
Advertisement
Answer
Your problem is the place where you put $DB->set_charset('utf8');
You need to tell the database in which charset you send or want to receive the data before you do the query.
But because you have $DB->set_charset('utf8');
after your queries the command has no effect, to the previous queries.
If no charset is defined for the connection then the DMBS uses the charset that is set as default in the configs.
For mysql this this may be e.g. latin1
.
Because of that mysql thinks it received data that is encoded in e.g. latin1
and would convert it to utf8
thats why you see these strange symbols.
To solve the problem you just need ensure that $DB->set_charset('utf8');
is called before the queries that passes or wants to receive data in utf8
.
For your example you could place it right after the if( $DB->connect_errno ){}
because at that place the connection was successfully established.