Kobo-Daishi Posted June 23, 2014 at 09:29 PM Report Posted June 23, 2014 at 09:29 PM At one time the chinalanguage.com web site had their CCDICT database available for free download. Their content is essentially the head character entries from the Far East Chinese-English Dictionary (seen on quite a few forumites' shelfies. I also have a copy). So the CCDICT is like the zidian portion of the Far East. All zi's and no ci's. Chinese, as most if not all languages, have at least two or more definitions for each character. I'm trying to bring my Chinese to another level by learning the different definitions for each character. I have a copy of the Far East, but, my eyes are not what they used to be and I don't relish the thought of holding a magnifying glass to read through it. At this thread they have a link to where there's a file with the data. http://www.chinalanguage.com/forums/viewtopic.php?f=8&t=2005&sid=b3251d98e314007a1d833177d34ef030&start=30 The Perl repository where file located. http://search.cpan.org/~drolsky/Lingua-ZH-CCDICT-0.05/lib/Lingua/ZH/CCDICT.pm But unfortunately the characters are in Unicode codepoints. I'm not a programmer so know next to nothing about coding. My question is how do you turn the codepoints into characters so that the file is useful to a layman? Kobo. Quote
flow Posted June 24, 2014 at 11:17 AM Report Posted June 24, 2014 at 11:17 AM i gave the data a quick and cursory overhaul; you can download it as https://raw.githubusercontent.com/loveencounterflow/ccdict/master/Lingua-ZH-CCDICT-0.05-transformed.txt (repo at https://github.com/loveencounterflow/ccdict). i converted the U+XXXX notations to characters (encoded as UTF-8 ) and also replaced the character references in the glosses. have fun. 1 Quote
Kobo-Daishi Posted June 25, 2014 at 08:56 PM Author Report Posted June 25, 2014 at 08:56 PM @ Flow Thanks. Much appreciated. Kobo. Quote
Kobo-Daishi Posted June 25, 2014 at 09:39 PM Author Report Posted June 25, 2014 at 09:39 PM At one time the chinalanguage.com web site had their CCDICT database available for free download. Their content is essentially the head character entries from the Far East Chinese-English Dictionary (seen on quite a few forumites' shelfies. I also have a copy). So the CCDICT is like the zidian portion of the Far East. All zi's and no ci's. Okay, it's more than just the Far East head character entries. The definition part is mostly. But they've added extra character entries probably derived from Unicode's Unihan with radical and stroke count, and Cantonese pronunciation. And their own Hakka pronunciations from several sources. Now to try to get the information into a format that can be used with StarDict and GoldenDict. Kobo. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.