vellocet Posted November 15, 2022 at 05:18 PM Report Posted November 15, 2022 at 05:18 PM I have a plain text file with what I know is Chinese. But when opening it, it displays gibberish. I know it's some kind of encoding problem, but how can I fix it? I already tried opening as UTF-8 and it didn't work. Here is a sample text. The numbers are unrelated. 3 00:00:34,761 --> 00:00:37,216 ¡]ÄÁÁn¡^ 4 00:01:06,281 --> 00:01:08,168 ¡]µ¼ÖÅT°_¡^ 5 00:01:20,489 --> 00:01:22,725 §O®`²Û 6 00:01:22,792 --> 00:01:25,792 Åý·P±¡¦Û¥Ñ©b¬y§a 7 00:01:27,401 --> 00:01:29,605 §O®`©È 8 00:01:29,673 --> 00:01:32,804 ¤£µM¨S¤Hª¾¹D§A¦b¨º¨à 9 00:01:34,089 --> 00:01:36,707 ©ù°_§AªºÀY 10 00:01:36,777 --> 00:01:39,647 Åý·P±¡ºÉ±¡«Å¬ª§a 11 00:01:40,905 --> 00:01:43,458 ¤£n®`²Û 12 00:01:43,529 --> 00:01:46,464 Åý·P±¡¦Û¥Ñ©b¬y§a Quote
roddy Posted November 15, 2022 at 08:20 PM Report Posted November 15, 2022 at 08:20 PM Notepad++ will (or at least did) let you cycle through a bunch of encodings to see if one makes sense. 1 1 Quote
vellocet Posted November 16, 2022 at 04:10 AM Author Report Posted November 16, 2022 at 04:10 AM I have Editpad, which lets you do that, but nothing I could see worked. I downloaded this from a Chinese site so it must work for someone. This has to be a common problem with a common solution. If I open it in Notepad, I get Chinese, but it's the wrong Chinese. 1 00:00:24,361 --> 00:00:28,420 牧羘 2 00:00:29,609 --> 00:00:33,351 牧羘 3 00:00:34,761 --> 00:00:37,216 牧羘 4 00:01:06,281 --> 00:01:08,168 贾臫癬 5 00:01:20,489 --> 00:01:22,725 甡槽 6 00:01:22,792 --> 00:01:25,792 琵稰薄パ゜瑈 7 00:01:27,401 --> 00:01:29,605 甡┤ 8 00:01:29,673 --> 00:01:32,804 ぃ礛⊿笵êㄠ 9 00:01:34,089 --> 00:01:36,707 癬繷 10 00:01:36,777 --> 00:01:39,647 琵稰薄荷薄 11 00:01:40,905 --> 00:01:43,458 ぃ璶甡槽 12 00:01:43,529 --> 00:01:46,464 琵稰薄パ゜瑈 Quote
Popular Post Demonic_Duck Posted November 16, 2022 at 04:45 AM Popular Post Report Posted November 16, 2022 at 04:45 AM A while back, I made this tool to diagnose Mojibake, because (amazingly) I couldn't seem to find an existing one. It's quite basic — the heuristics to detect the most probable encodings aren't perfect, and diagnosis will crash with longer texts. Looks like your text is encoded with big5 and decoded with iso-8859-1, which can be reversed by encoding with iso-8859-1 and decoding with big5. You can use the "Decode" tab of the tool to do that: https://ioiw9.csb.app/decode?from=iso88591&to=big5hkscs 1 1 5 Quote
arrow Posted November 17, 2022 at 02:11 AM Report Posted November 17, 2022 at 02:11 AM It's coded in BIG5 or cp950 1 Quote
vellocet Posted November 17, 2022 at 04:49 AM Author Report Posted November 17, 2022 at 04:49 AM Outstanding! I knew something like this had to exist somewhere! But gosh, I didn't expect you had to code it yourself. Thanks though, you've done the world a service. And me! Bookmarked! Having decoded the file (finally), I find now that it is in Traditional Chinese. ? And the time codes are off. If it ain't one thing, it's another. I'm off to find a converter - which has been done before by many others. And VLC has some kind of time shifting for subtitles, which I need to look up. Thanks to Demonic_Duck for saving the day. 1 Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.