Jump to content
Chinese-Forums
  • Sign Up

Uploading a Chinese CSV file into MySQL


Recommended Posts

Posted

Great Brick Wall of Character Encoding>>> :wall

I've got a csv file with Chinese characters in on my laptop. I want to upload this to MySQL via phpMyAdmin.

I've done this before. It worked. I just have no idea how I did it and I can't get it to work again. I've tried saving the file as ANSI, Unicode, and UTF-8. I've set browser encoding to UTF-8, GB2312, ISO2046, and stun. The file uploads fine in all cases, and it inserts all rows. But when I try to view the info, it's mangled under all browser encodings.

I'm noticing new references to something called 'collation' in the properties of databases and tables. This seems to be language related, and there are lots of confusing options. This may be the problem, but I haven't yet figured out the solution.

If anyone has any hints, you will earn my undying gratitude, and possible a VCD of a certain well-known Taiwanese popster.

Roddy

Posted

I would think that only saving as UTF-8 would work, as both of the other two are Latin-based character sets. ANSI only uses 7 bits. Non-UTF-8 Unicode is not supported by the browser.

Do you have these fields in the phpMyAdmin set correctly? How are you viewing the data you've inserted? Through phpMyAdmin?

http://www.efficienthosting.co.uk/support/mysql.html#data

Set the Fields terminated by entry to whatever character is used in your text file to separate neighbouring field values. Common characters for separating fields are tabs (which should be specified as t) and commas.

If your text file uses quotes (or some other character) around text values (or all fields) you will need to enter this character in the Fields enclosed by box. This can often be left blank (deleting any character suggested by phpMyAdmin).

In most cases, the Fields escaped by box should be left blank (deleting any character suggested by phpMyAdmin).

If you are connecting from a Windows based machine, the Lines terminated by entry should normally be set to rn (the default suggested by phpMyAdmin). Windows uses both a carriage return character ® and a newline character (n) to terminate the lines of text files.

Posted

The actual uploading process is fine - all the fields end up in the right place. However, anything that isn't plain English is scrambled - characters and tone-carrying letter in the pinyin are a mess.

I suspect there's something in the uploading process which is mangling it, but can't be sure yet. I think I've tried all possible combinations with phpMyAdmin though. The bizzare thing is that it doesn't even work if I copy and paste.

Have since realised though that I still have access to a MySQL database with the information intact, so I can try downloading it as a MySQL file and see if that helps.

Roddy

Posted

Didn't see that you did, but have you tried changing the table field values 'collation to 'binary' and 'type' to Varchar(x) ? where x = length (You needn't reload.)

Posted

The field type shouldn't be an issue. However, I'll try changing the collation (I don't really understand what collation means here, but am quite happy to tamper with it)

Roddy

Posted

Ok, changed type to VARCHAR and collation to binary for table fields which contain characters or pinyin, and this seems to have solved the problem. Many thanks, agentzi.

If anyone's got a simple explanation of what collation is, particularly as it relates to Chinese, I'd be interested to hear it.

Roddy

Posted
Ok' date=' changed type to VARCHAR and collation to binary for table fields which contain characters or pinyin, and this seems to have solved the problem. Many thanks, agentzi.

If anyone's got a simple explanation of what collation is, particularly as it relates to Chinese, I'd be interested to hear it.

Roddy[/quote']

About collation and MySQL

http://www.sourcekeg.co.uk/www.mysql.com/tech-resources/articles/4.1/unicode.html

Collation can sometimes cause problems with loading data - I had similar problems with SQL Server and Oracle.

  • 4 weeks later...
Posted

Ok, another related question. I'm now working with a database which is GB2312 encoded. Again, I'm just getting question marks. I've tried setting collation to binary as suggest above, and gb_chinese_ci and gb_chinese_bin with no luck.

Any ideas?

Roddy

  • 10 months later...
Posted

Spent an age working on this again today. Actually come to the conclusion that if possible, and you aren't dealing with a huge number of records, it's actually easier to copy and paste the data into the CMS on the new server, rather than try to figure out how to move the database. Maybe I've just been unlucky . . .

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...