ChristopherB Posted November 5, 2008 at 07:50 AM Report Posted November 5, 2008 at 07:50 AM I'm aware it's difficult for software to always perfectly convert one character set to another, due to the lack of a one-to-one mapping between every tradtional and simplified character. My question is, does one way have a better chance of correct conversion over another; that is, is traditional to simplified likely to be more accurate than the other way around? Quote
skylee Posted November 5, 2008 at 08:49 AM Report Posted November 5, 2008 at 08:49 AM is traditional to simplified likely to be more accurate than the other way around? Yes. Because 1 simplified character might represent more than 1 traditional character. Examples include 只, 发, 于, 云, etc. Quote
trevelyan Posted November 6, 2008 at 10:54 AM Report Posted November 6, 2008 at 10:54 AM If you actually have to deal with Chinese data processing, you should probably just use existing tools to make the problem go away. Even if you store all of your own data in traditional you'll have to deal with simplified input at some point. Best solution is to use software that does word and phrase level character conversion instead of character-level conversion. Adso takes care of this and can be downloaded from http://adsotrans.com/downloads/. Google is getting better at it too. Quote
westmeadboy Posted September 3, 2009 at 06:47 AM Report Posted September 3, 2009 at 06:47 AM Given one traditional character, is there exactly one corresponding simplified character? Quote
imron Posted September 3, 2009 at 08:15 AM Report Posted September 3, 2009 at 08:15 AM Not necessarily. One example I can think off the top of my head is 乾, which is pronounced both gān and qián. For the pronunciation gān the character is simplified as 干. For the pronunciation qián the character maintains its original form. So, in a given piece of Traditional Chinese text, when converting to Simplified the character 乾 will sometimes be 乾 and sometimes be 干. Quote
westmeadboy Posted September 3, 2009 at 08:21 AM Report Posted September 3, 2009 at 08:21 AM How about if you are given a traditional character AND its pinyin? Quote
imron Posted September 3, 2009 at 12:17 PM Report Posted September 3, 2009 at 12:17 PM Not sure, however the first question that would spring to my mind is how was that pinyin created? If it was generated, then it might also suffer from inaccuracies. Just thought of another character too - 麽 which maps to both 么 and 麽 (but also with different pinyin). Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.