renzhe Posted June 6, 2008 at 09:54 PM Report Posted June 6, 2008 at 09:54 PM A long shot, but we have some gurus here who might know the solution. Problem: UTF-8 locale in the console, but GB2312-encoded filenames. Complete mess and impossible to know what is what, all I get are question marks. Is there a way to rename these to their UTF-8 equivalents without too much effort? I'm not looking for a handout, a push in the right direction would be appreciated. Perhaps a perl or python script can do such a thing? Quote
imcgraw Posted June 6, 2008 at 11:00 PM Report Posted June 6, 2008 at 11:00 PM I wrote some java code a while back to do this. There's probably a simpler way, but if you can compile Java, here's the source: http://people.csail.mit.edu/imcgraw/Converter.txt Oh... but I just realized you're talking about the file names not the contents of the files... that I'm not sure... maybe the java code could be adapted, but again, there's probably a simpler way. I just use Java because the rest of my work requires it. Quote
renzhe Posted June 6, 2008 at 11:59 PM Author Report Posted June 6, 2008 at 11:59 PM Thanks for the helping hand, but you're right, I'd like to convert filenames. I'm afraid that they have already lost all the information at the moment they were saved to the filesystem, though. I have a feeling that the ?????-01.rmvb are in fact just question marks, at least that's what my attempts so far seem to suggest..... EDIT: I found this, but it doesn't work on my files. I guess the information I need is lost already. Quote
imron Posted June 7, 2008 at 01:00 AM Report Posted June 7, 2008 at 01:00 AM I have a feeling that the ?????-01.rmvb are in fact just question marksThe easy way to check would be to set the encoding of the shell to GB2312 and see if you can see the files. Quote
renzhe Posted June 7, 2008 at 01:08 AM Author Report Posted June 7, 2008 at 01:08 AM No, the filenames are completely unimpressed by my encoding settings. ??? galore. I'm guessing that mldonkey failed to save them using the original encoding, or that the filesystem (ext3) didn't like them. Quote
ipsi() Posted June 7, 2008 at 06:11 AM Report Posted June 7, 2008 at 06:11 AM Chances are good it's an mlDonkey problem. I've got some filenames in GB2312, and they display as random characters in the file browser (but as ?'s in the console, I think), but I am able to copy/paste them to an editor and convert them. Haven't bothered to yet though, as it's a fair amount of effort... Quote
renzhe Posted June 7, 2008 at 01:10 PM Author Report Posted June 7, 2008 at 01:10 PM The mlconv utility I linked to will do that for you automatically. Unfortunately, that doesn't help me, so I'm stuck with a bunch of ?????-01.rmvb files, which is great for watching things from the command line.... Quote
imron Posted June 7, 2008 at 02:30 PM Report Posted June 7, 2008 at 02:30 PM Why not rename them all to something in pinyin? Surely you could do a bulk rename? Quote
BrandeX Posted June 7, 2008 at 03:23 PM Report Posted June 7, 2008 at 03:23 PM I get the same kind of thing when using aMule, although I can input and read most anything else in chinese just fine. Quote
renzhe Posted June 24, 2008 at 02:39 PM Author Report Posted June 24, 2008 at 02:39 PM Why not rename them all to something in pinyin? Surely you could do a bulk rename? I'll probably end up doing this, but since they all land in the same download directory, which is full of different files from our episode project, it will take some script-fu. Quote
imron Posted June 24, 2008 at 02:48 PM Report Posted June 24, 2008 at 02:48 PM Speaking of which, have you made it through magic mobile phone yet? Quote
renzhe Posted June 24, 2008 at 03:25 PM Author Report Posted June 24, 2008 at 03:25 PM Not yet. I've been really busy recently, and I'm still catching up with some of the shows we've started. Also, I'm trying to savour it while it lasts. Don't want the joy to end Quote
renzhe Posted February 9, 2009 at 10:30 PM Author Report Posted February 9, 2009 at 10:30 PM Just an update. It seems to be a KMLDonley problem. If I download directly from MLDonley's browser interface, I get the Chinese names just fine (in UTF-. I'm hoping that this will be fixed in the new version of KMLDonkey, until then, I'll use the browser interface. Now I need an efficient way to use the tab-completion together with Chinese input in Konsole Quote
renzhe Posted March 18, 2009 at 11:01 PM Author Report Posted March 18, 2009 at 11:01 PM I've fixed the problem by setting KMLDonkey's default encoding to UTF-8. Now everything is A-OK. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.