xinoxanu Posted October 22, 2020 at 06:12 PM Report Share Posted October 22, 2020 at 06:12 PM 哈罗! So I have a rather long book that is currently written in Traditional Chinese. I thought that a simple character conversion to Simplified would do the trick, since I don't really mind the Taiwanese localization... but based on what I can read here that might not be actually that easy and random internet converters won't do a perfect Traditional <> Simplified transliteration. What tool do you recommend to carry this out? Perhaps this? PS: Ground control to @imron ? Quote Link to comment Share on other sites More sharing options...
Demonic_Duck Posted October 22, 2020 at 07:03 PM Report Share Posted October 22, 2020 at 07:03 PM 51 minutes ago, xinoxanu said: random internet converters won't do a perfect Traditional <> Simplified transliteration. Not just random Internet converters won't, but no converter will do a perfect job (until something close to artificial general intelligence is developed). But the MediaWiki one does seem to do a damn good job, judging by its results. Not sure to what extent those are manually tweaked by Wikipedia editors afterwards, though. 2 Quote Link to comment Share on other sites More sharing options...
NinKenDo Posted October 23, 2020 at 12:53 AM Report Share Posted October 23, 2020 at 12:53 AM There's no one perfect approach. For accuracy, Wenlin, because any time there's an ambiguity it will ask that you fix it. However that's tedious. There are probably more complex parser based converters that go beyond the character/word level that might give you significantly more efficiency for only a very slight reduction in accuracy. It really just depends how much you weight accuracy vs efficiency vs manual tedium. 2 Quote Link to comment Share on other sites More sharing options...
Kenny同志 Posted October 23, 2020 at 03:59 AM Report Share Posted October 23, 2020 at 03:59 AM Just adding that if you are not looking for a perfect conversion, then most converters might be able to do a good enough job. Converting traditional Chinese texts into simplified generates much fewer mistakes than the other way around because except for a few rare cases, it is always one or more characters in traditional that correspond to one single character in simplified, allowing for few mistakes in the conversion. If you need to convert a piece of simplified text into traditional, however, that would be a lot harder, because for a simplified character, the converter would have to decide which one of its traditional counterparts should be used in the given context and this often requires a bit of ‘thinking’, or intelligence. Note: One of the rare cases I mentioned can be the fact that 乾( as in 乾坤) and 乾(as in乾燥) correspond to 乾 and 干, respectively, in simplified Chinese. In this case, one traditional character corresponds to two simplified ones, rather than only one. 2 1 Quote Link to comment Share on other sites More sharing options...
xinoxanu Posted October 23, 2020 at 10:03 AM Author Report Share Posted October 23, 2020 at 10:03 AM Thank you all! I've settled for Google Translate's file functionality and it seems to be working just fine - so for now I'll be using that until I am able to get my hands on a proper Simplified Chinese copy of the book. 1 Quote Link to comment Share on other sites More sharing options...
Kenny同志 Posted October 23, 2020 at 10:06 AM Report Share Posted October 23, 2020 at 10:06 AM I think Google Translate can be a very good option. It involves translation after all. 1 Quote Link to comment Share on other sites More sharing options...
Takeshi Posted October 27, 2020 at 02:29 AM Report Share Posted October 27, 2020 at 02:29 AM Another one is 著, well, at least in TW Traditional, it's simplified counterparts are either 着 (zhe) or 著 (zhu) But in HK Traditional the 着/著 distinction is usually maintained (but not always, because a lot of people use the TW style). 1 Quote Link to comment Share on other sites More sharing options...
feihong Posted November 5, 2020 at 07:18 PM Report Share Posted November 5, 2020 at 07:18 PM I had a good experience recently with this Python library: https://github.com/zachary822/chinese-converter It was able to correctly figure out when to convert 著 (in traditional) to 着 (in simplified). 1 Quote Link to comment Share on other sites More sharing options...
feihong Posted November 7, 2020 at 01:58 PM Report Share Posted November 7, 2020 at 01:58 PM After some more experience with real data, I now prefer this Python library: https://github.com/berniey/hanziconv It hasn’t been updated in 4 years, but it does a better job of converting alternate versions of characters like 裏. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.