Skip to content
Advertisement

Convert latin1 characters on a UTF8 table into UTF8

Only today I realized that I was missing this in my PHP scripts:

JavaScript

All my tables are InnoDB, collation “utf8_unicode_ci”, and all my VARCHAR columns are “utf8_unicode_ci” as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

So, until now, every time I “INSERT” something with diacritics, example:

JavaScript

The ‘name’ contents would be, in this case: Jáuò Iñe.

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are “messed” at the moment. I tried many things already, but it always breaks the strings on the first “illegal” character. Here is my current code:

JavaScript

It “UPDATE”s with the expected characters, except that the string gets truncated after the character “ă”. I mean, that character and following chars are not included on the string.

Also, testing with the “iconv()” (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

Advertisement

Answer

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you’ll need a MySQL function like

JavaScript

It’s possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement