How to decode/recode Chinese gibberish (乱码 or mojibake)

November 15, 2022 at 05:18 PM

I have a plain text file with what I know is Chinese. But when opening it, it displays gibberish. I know it's some kind of encoding problem, but how can I fix it? I already tried opening as UTF-8 and it didn't work. Here is a sample text. The numbers are unrelated.

3
00:00:34,761 --> 00:00:37,216
¡]ÄÁÁn¡^

4
00:01:06,281 --> 00:01:08,168
¡]µ¼ÖÅT°_¡^

5
00:01:20,489 --> 00:01:22,725
§O®`²Û

6
00:01:22,792 --> 00:01:25,792
Åý·P±¡¦Û¥Ñ©b¬y§a

7
00:01:27,401 --> 00:01:29,605
§O®`©È

8
00:01:29,673 --> 00:01:32,804
¤£µM¨S¤Hª¾¹D§A¦b¨º¨à

9
00:01:34,089 --> 00:01:36,707
©ù°_§AªºÀY

10
00:01:36,777 --> 00:01:39,647
Åý·P±¡ºÉ±¡«Å¬ª§a

11
00:01:40,905 --> 00:01:43,458
¤£n®`²Û

12
00:01:43,529 --> 00:01:46,464
Åý·P±¡¦Û¥Ñ©b¬y§a

November 15, 2022 at 08:20 PM

Notepad++ will (or at least did) let you cycle through a bunch of encodings to see if one makes sense.

November 16, 2022 at 04:10 AM

I have Editpad, which lets you do that, but nothing I could see worked.

I downloaded this from a Chinese site so it must work for someone. This has to be a common problem with a common solution.

If I open it in Notepad, I get Chinese, but it's the wrong Chinese.

1
00:00:24,361 --> 00:00:28,420
牧羘

2
00:00:29,609 --> 00:00:33,351
牧羘

3
00:00:34,761 --> 00:00:37,216
牧羘

4
00:01:06,281 --> 00:01:08,168
贾臫癬

5
00:01:20,489 --> 00:01:22,725
甡槽

6
00:01:22,792 --> 00:01:25,792
琵稰薄パ゜瑈

7
00:01:27,401 --> 00:01:29,605
甡┤

8
00:01:29,673 --> 00:01:32,804
ぃ礛⊿笵êㄠ

9
00:01:34,089 --> 00:01:36,707
癬繷

10
00:01:36,777 --> 00:01:39,647
琵稰薄荷薄

11
00:01:40,905 --> 00:01:43,458
ぃ璶甡槽

12
00:01:43,529 --> 00:01:46,464
琵稰薄パ゜瑈

November 16, 2022 at 04:45 AM

A while back, I made this tool to diagnose Mojibake, because (amazingly) I couldn't seem to find an existing one. It's quite basic — the heuristics to detect the most probable encodings aren't perfect, and diagnosis will crash with longer texts.

Looks like your text is encoded with big5 and decoded with iso-8859-1, which can be reversed by encoding with iso-8859-1 and decoding with big5. You can use the "Decode" tab of the tool to do that:

https://ioiw9.csb.app/decode?from=iso88591&to=big5hkscs

November 17, 2022 at 02:11 AM

It's coded in BIG5 or cp950

November 17, 2022 at 04:49 AM

Outstanding! I knew something like this had to exist somewhere! But gosh, I didn't expect you had to code it yourself. Thanks though, you've done the world a service. And me! Bookmarked!

Having decoded the file (finally), I find now that it is in Traditional Chinese. ? And the time codes are off. If it ain't one thing, it's another. I'm off to find a converter - which has been done before by many others. And VLC has some kind of time shifting for subtitles, which I need to look up. Thanks to Demonic_Duck for saving the day.

Sign In

How to decode/recode Chinese gibberish (乱码 or mojibake)

Recommended Posts

vellocet

roddy

vellocet

Demonic_Duck

arrow

vellocet

Join the conversation