westmeadboy Posted April 18, 2010 at 08:45 AM Report Posted April 18, 2010 at 08:45 AM I've seen a few threads about Hanzi-to-Pinyin conversion but they talk about various websites rather than APIs that can be accessed using something like JSON. I know you can access Google Translate using JSON and that Google Translate provides pinyin romanization of Hanzi, but is it possible to access that through JSON also? Alternatively I could just build my own offline converter based on CC-CEDICT data. It wouldn't be perfect but probably good enough for my needs. In cases where a character maps to multiple pinyins I would just use the most common one. How does that sound? Quote
lingjoin Posted April 19, 2010 at 01:17 AM Report Posted April 19, 2010 at 01:17 AM In cases where a character maps to multiple pinyins I would just use the most common one. --------> The problem lies in such cases are very common. The precision would be depressive. A better process recommendated is: 1. Making word segmentation on input sentences. (Unlike English, a Chinese sentence should be splitted into a token sequence.) 2. Map each word to pinyin. The map is often unique. The technique is state-of-the-art. And we have collect such mapping list before. Quote
roddy Posted April 19, 2010 at 02:18 AM Report Posted April 19, 2010 at 02:18 AM Have a look at Adsotrans - if they're not doing it, I doubt anyone else is. Quote
renzhe Posted April 19, 2010 at 02:28 AM Report Posted April 19, 2010 at 02:28 AM Yeah, I think that adsotrans does exactly this, and should be considerably more accurate than a CEDICT-based character-for-character conversion. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.