So, I keep having all sorts of problems when moving my mysql database from one machine to another. The issue is that in text or varchar columns containing data that have special characters like smart quotes (those crazy curly quotes), accented letters, etc. result in scrambled data. For example the following phrase
it’s
results in
itâs
in the database and then renders incorrectly on the website as
it’s
This is supposed to be a popular problem when moving from mysql 4.0 to 4.1 or later, but I seem to have this problem moving data from one database to another on the same mysql installation. Moving from one maching with 4.1 to another with 4.1 also results in this crazy character problem.
Most of the solutions I found online end up with complicated find and replace functions to remove these problem characters. Unfortunately, that just not a solution that I'm okay with. After fiddling with it for most of the day (and giving up several times in the past), here's what I did that worked for me.
As I understand it, the problem arises from the fact that mysqldump uses utf8 for character encoding while, more often than not, mysql tables default to latin1 character encoding. (If you were smart enough to manually set the character encoding to utf8, then you'll have no problems - everyone running mysql 4.0 or early will be using latin1 since it didn't support any other encodings.) So lets say we have a database named example_db with tables that have varchar and text columns. If you have special characters that are really UTF-8 encoded characters stored in the db, it works just fine until you try to move the db to another server.
As I understand it, the problem arises from the fact that mysqldump uses utf8 for character encoding while, more often than not, mysql tables default to latin1 character encoding. (If you were smart enough to manually set the character encoding to utf8, then you'll have no problems - everyone running mysql 4.0 or early will be using latin1 since it didn't support any other encodings.) So lets say we have a database named example_db with tables that have varchar and text columns. If you have special characters that are really UTF-8 encoded characters stored in the db, it works just fine until you try to move the db to another server.
What happens when you execute the command:
mysqldump -u mysql_username -p example_db > example_db.mysql
is that mysqldump takes the latin1 encoded tables (containing utf8 characters) and translates them into utf8 encoding (so the utf8 characters are no longer what they are supposed to be) before writing it to disk. In theory, importing the data into mysql should work since it's in utf8 and mysql 4.1 and higher understands utf8, but unfortunately, the double encoding on these characters messes it up and you get funny text (as shown above). As far as I can tell when you execute:
mysql -u mysql_username -p new_db < example_db.mysql
mysql reads the data in (which is in utf8) and converts it to your character encoding. The special character that was misinterpreted by mysqldump and encoded is now decoded to the misinterpreted character and you have a funky series of symbols.
There's two ways of dealing with this that works (for me). The first is dealing with it on an individual table column level and the second works on the dump (database) level.
1. Fixing a single column
With this solution, you just perform a mysqldump as you normally would (see above) and then import as you normally would (see above). The special characters are messed up. They aren't really messed up - they just looked messed up. In actuality, your data is probably perfectly formatted in utf8 but is in a column that is latin1. So, we simply need to switch that column from latin1 to utf8 without altering the data. Unfortunately, you can't just run the ALTER TABLE command that changes the character encoding because then mysql will convert the data from latin1 to utf8 (including the special characters) and you'll end up with a different set of gibberish characters. We just need to change the type WITHOUT running a conversion. To do this change the varchar to binary and the text to blob. This change does not result in any conversion or re-encoding. Then switch it back to varchar or text with the correct encoding.
For a varchar(255) column named "column_name" in a table named "example_table":
With this solution, you just perform a mysqldump as you normally would (see above) and then import as you normally would (see above). The special characters are messed up. They aren't really messed up - they just looked messed up. In actuality, your data is probably perfectly formatted in utf8 but is in a column that is latin1. So, we simply need to switch that column from latin1 to utf8 without altering the data. Unfortunately, you can't just run the ALTER TABLE command that changes the character encoding because then mysql will convert the data from latin1 to utf8 (including the special characters) and you'll end up with a different set of gibberish characters. We just need to change the type WITHOUT running a conversion. To do this change the varchar to binary and the text to blob. This change does not result in any conversion or re-encoding. Then switch it back to varchar or text with the correct encoding.
For a varchar(255) column named "column_name" in a table named "example_table":
ALTER TABLE example_table MODIFY column_name BINARY(255);
ALTER TABLE example_table MODIFY column_name VARCHAR(255) CHARACTER SET utf8;
For a text column named "text_column_name" in a table named "example_table":
ALTER TABLE example_table MODIFY text_column_name BLOB;
ALTER TABLE example_table MODIFY text_column_name TEXT CHARACTER SET utf8;
The data in that table should now be properly interpreted. If not, you've got a different problem than the one I experienced.
2. Fixing the problem on import
If you've got a lot of columns and would prefer to fix it while importing, a solution that works most of the time (repeat: most of the time) is to perform a mysqldump forcing the dump to write out data in latin1. Then on the import we "fool" mysql into thinking it's utf8 data.
For a database called example_db (one following is one line):
If you've got a lot of columns and would prefer to fix it while importing, a solution that works most of the time (repeat: most of the time) is to perform a mysqldump forcing the dump to write out data in latin1. Then on the import we "fool" mysql into thinking it's utf8 data.
For a database called example_db (one following is one line):
mysqldump -u mysql_username -p --default-character-set=latin1 example_db > example_db.mysql
Now open the dump file (example_db.mysql in this example) and edit it. There will be a line that reads like:
/*!40101 SET NAMES latin1 */;
Change latin1 to utf8:
/*!40101 SET NAMES utf8 */;
Now, import the file into mysql:
mysql -u mysql_username -p new_db < example_db.mysql
In most cases everything will import properly. In a few corner cases, the decoded data will resolve to the same text as another piece of decoded data and this could result in a conflict if this occurs on fields that must have unique entries. In this case, you'll need to re-encode manually using the first solution.
There is a popular solution online that claims you just perform a mysqldump with
--default-character-set=latin1
and import withmysql --default-character-set=utf8
but it didn't work for me. It is possible that it might work if you deleted theSET NAMES latin1
line in the dump, but if you're in there you might as well just change it to utf8.
I hope this helps some people out.
Update 2007-05-21: Even after testing the conversion so many times, sometimes it just doesn't work as expected. The last time I did this, the conversion dropped all the text after a special character. So, please test this before using.
0 komentar:
Post a Comment