roddy Posted July 31, 2005 at 04:52 AM Report Posted July 31, 2005 at 04:52 AM Great Brick Wall of Character Encoding>>> I've got a csv file with Chinese characters in on my laptop. I want to upload this to MySQL via phpMyAdmin. I've done this before. It worked. I just have no idea how I did it and I can't get it to work again. I've tried saving the file as ANSI, Unicode, and UTF-8. I've set browser encoding to UTF-8, GB2312, ISO2046, and stun. The file uploads fine in all cases, and it inserts all rows. But when I try to view the info, it's mangled under all browser encodings. I'm noticing new references to something called 'collation' in the properties of databases and tables. This seems to be language related, and there are lots of confusing options. This may be the problem, but I haven't yet figured out the solution. If anyone has any hints, you will earn my undying gratitude, and possible a VCD of a certain well-known Taiwanese popster. Roddy Quote
gato Posted July 31, 2005 at 05:25 AM Report Posted July 31, 2005 at 05:25 AM I would think that only saving as UTF-8 would work, as both of the other two are Latin-based character sets. ANSI only uses 7 bits. Non-UTF-8 Unicode is not supported by the browser. Do you have these fields in the phpMyAdmin set correctly? How are you viewing the data you've inserted? Through phpMyAdmin? http://www.efficienthosting.co.uk/support/mysql.html#data Set the Fields terminated by entry to whatever character is used in your text file to separate neighbouring field values. Common characters for separating fields are tabs (which should be specified as t) and commas. If your text file uses quotes (or some other character) around text values (or all fields) you will need to enter this character in the Fields enclosed by box. This can often be left blank (deleting any character suggested by phpMyAdmin). In most cases, the Fields escaped by box should be left blank (deleting any character suggested by phpMyAdmin). If you are connecting from a Windows based machine, the Lines terminated by entry should normally be set to rn (the default suggested by phpMyAdmin). Windows uses both a carriage return character ® and a newline character (n) to terminate the lines of text files. Quote
roddy Posted July 31, 2005 at 12:56 PM Author Report Posted July 31, 2005 at 12:56 PM The actual uploading process is fine - all the fields end up in the right place. However, anything that isn't plain English is scrambled - characters and tone-carrying letter in the pinyin are a mess. I suspect there's something in the uploading process which is mangling it, but can't be sure yet. I think I've tried all possible combinations with phpMyAdmin though. The bizzare thing is that it doesn't even work if I copy and paste. Have since realised though that I still have access to a MySQL database with the information intact, so I can try downloading it as a MySQL file and see if that helps. Roddy Quote
agentzi Posted July 31, 2005 at 09:37 PM Report Posted July 31, 2005 at 09:37 PM Didn't see that you did, but have you tried changing the table field values 'collation to 'binary' and 'type' to Varchar(x) ? where x = length (You needn't reload.) Quote
roddy Posted August 1, 2005 at 01:38 AM Author Report Posted August 1, 2005 at 01:38 AM The field type shouldn't be an issue. However, I'll try changing the collation (I don't really understand what collation means here, but am quite happy to tamper with it) Roddy Quote
roddy Posted August 1, 2005 at 03:59 AM Author Report Posted August 1, 2005 at 03:59 AM Ok, changed type to VARCHAR and collation to binary for table fields which contain characters or pinyin, and this seems to have solved the problem. Many thanks, agentzi. If anyone's got a simple explanation of what collation is, particularly as it relates to Chinese, I'd be interested to hear it. Roddy Quote
atitarev Posted August 1, 2005 at 05:08 AM Report Posted August 1, 2005 at 05:08 AM Ok' date=' changed type to VARCHAR and collation to binary for table fields which contain characters or pinyin, and this seems to have solved the problem. Many thanks, agentzi. If anyone's got a simple explanation of what collation is, particularly as it relates to Chinese, I'd be interested to hear it. Roddy[/quote'] About collation and MySQL http://www.sourcekeg.co.uk/www.mysql.com/tech-resources/articles/4.1/unicode.html Collation can sometimes cause problems with loading data - I had similar problems with SQL Server and Oracle. Quote
trevelyan Posted August 4, 2005 at 07:18 PM Report Posted August 4, 2005 at 07:18 PM You're giving away Zhou Jielun cds? Quote
roddy Posted September 1, 2005 at 07:26 AM Author Report Posted September 1, 2005 at 07:26 AM Ok, another related question. I'm now working with a database which is GB2312 encoded. Again, I'm just getting question marks. I've tried setting collation to binary as suggest above, and gb_chinese_ci and gb_chinese_bin with no luck. Any ideas? Roddy Quote
roddy Posted July 5, 2006 at 11:01 AM Author Report Posted July 5, 2006 at 11:01 AM Spent an age working on this again today. Actually come to the conclusion that if possible, and you aren't dealing with a huge number of records, it's actually easier to copy and paste the data into the CMS on the new server, rather than try to figure out how to move the database. Maybe I've just been unlucky . . . Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.