mirgcire Posted July 4, 2013 at 10:46 PM Report Posted July 4, 2013 at 10:46 PM I am working on a program that for studying Chinese and I want to integrate a dictionary for looking up chinese words. cc-cedict seemed like the right choice because some very reputable tools are based off of it. My first thought was to parse the file and build a table of all the English words used in any definition. Each table entry would have all the Chinese words that used the head word in its definition. As one might expect, this means that "the", "a" and "is" have a disproportionate number of associated Chinese words. To improve on that I tried using a character frequency chart to order the list of Chinese according to frequency, but this still has puts a lot of irrelevant words up to the top of the list. Can anyone suggest a better alternative? Thanks! Quote
mikelove Posted July 4, 2013 at 10:51 PM Report Posted July 4, 2013 at 10:51 PM Pay attention to placements too - for example, if a word is the only word in a particular sub-definition, or the first word in it, that probably means it's important / has more to do with the original Chinese meaning. It's also worth making a list of super-common words like "the" and filtering those out when determining these placements. (so a definition "the sky" would come up as a high-priority match for "sky") Word frequency also helps, though - make sure it's words and not just characters, as there are plenty of uncommon words that contain very common characters. Also, another open-source resource you can try is this one from the Linguistic Data Consortium at UPenn: http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm We offer it in Pleco as an add-on download - no Pinyin, but I believe it was designed around English-to-Chinese so it might give you better results for that reason. Quote
Kobo-Daishi Posted July 4, 2013 at 11:34 PM Report Posted July 4, 2013 at 11:34 PM I wouldn't try to use the CC-CEDICT as an English-to-Chinese dictionary. I use a newer edition of the CC-CEDICT in the old un-updated CQuickTrans freeware program and when I enter a search for "Taiwan" I get 200 entries in simplified and traditional, but, only two (four?) of the entries are the ones I want. The simplified and traditional way of writing "wan" and the two ways of writing "tai". And this even when I set it to filter "match entire", "match beginning", "match ending", and "match anywhere". All the other entries are like "Peitou (town in Taiwan)", "Nantun (area in Taiwan)", "Tainan (city in Taiwan)", "New Taiwan dollar", "night markets in Taiwan", "Taiwan Strait", "Taiwan Affairs Office", "Tapu (village in Taiwan)", "Lee Teng-hui (Taiwan leader)", "bicycle (Taiwan)", "Miniatures Museum of Taiwan", "videotape (Taiwan usage)", etc. This is but one example. What of other search terms. It would be too confusing for the learner. That's why I don't use the CQuickTrans much for it's dictionary. It used to be my go to dictionary, but, as CC-CEDICT has expanded it's not good for looking up English words into Chinese. I mainly use GoldenDIct with the dictionaries in the StarDict format. They've got several English-Chinese dictionaries. But there might be legality issues with some of those dictionaries. Kobo. 1 Quote
tooironic Posted July 5, 2013 at 08:16 AM Report Posted July 5, 2013 at 08:16 AM Wiktionary has a decent English into Chinese coverage, and it's available for free use. Not perfect, of course, but you'll notice that translations are always given to specific senses, rather than just words themselves. This is incredibly important in compiling any English-to-foreign-language dictionary. I imagine adapting CEDICT's CE dictionary to EC would be a huge project as a human editor would have to go through almost every translation to make sure the modern and frequently used word is given rather than its myriad synonyms. Quote
mirgcire Posted July 8, 2013 at 09:10 PM Author Report Posted July 8, 2013 at 09:10 PM Also, another open-source resource you can try is this one from the Linguistic Data Consortium at UPenn: @MikeLove, this is a great resource. Using this I don't need to worry about sorting for relevancy. I actually integrated it and it works great. If I decide to market it I will consult LDC about their usage restrictions. I mainly use GoldenDIct with the dictionaries in the StarDict format. @Kobo, I agree with you, grep-ing cc-cedic is not very helpful. However exploring GoldenDict and StarDict was not productive either. I looked in many different places for an English to Chinese dictionary, but all I found was more links. I probably just gave up too soon. Wiktionary has a decent English into Chinese coverage @tooironic, Wikionary sounds promising, but I couldn't find what I wanted. The explanation of the English section reads as follows: "aims to describe all words of all languages using definitions and descriptions in English." This is clearly not what I want. So I checked out the Chinese section ... which was all in Chinese and a bit over my head. If you have any tips for navigating this site, I would love to hear them. Thanks all! Quote
tooironic Posted July 8, 2013 at 10:43 PM Report Posted July 8, 2013 at 10:43 PM Wiktionary includes many translations of English words, terms and phrases into Chinese. Simply type in a term, scroll down to "Translations" and you'll see a number of translations into languages other than English. Quote
Kobo-Daishi Posted July 18, 2013 at 12:31 AM Report Posted July 18, 2013 at 12:31 AM I tested out the Chinese portion of Wiktionary when this thread first came out. I tested using only 4 words. I didn't try really easy words such as dog, cat, boy, girl, etc. because I figured if they didn't include such basic definitions then it didn't deserve to be called a dictionary. I used the word milquetoast (fairly old, but, still common enough in old movies), buckshee (saw on British TV episode of Agatha Christie, was new to me and I consider myself fairly well read), cock (for the slang definition meaning male member to check profane quotient of dictionary), and buckskin (just because saw on another page and couldn't think of more words to look up). As you can see, they didn't have definitions for milquetoast and buckshee. The StarDict/GoldenDict combination that I've installed did. They do have the male member def though. And I hadn't known that buckskin also referred to the wearers of said garments as well. A few days later I re-checked the dictionary and lo and behold...the definitions had been added. So, it seems that they keep track of viewer searches and updates them. I wonder if they've got a team or what who sign up to work on which ever language and they do the reviewing. Anyway, it is being updated. But if I'd downloaded a version before my search for milquetoast and buckshee, they wouldn't have been included. Kobo. Quote
tooironic Posted July 18, 2013 at 09:17 AM Report Posted July 18, 2013 at 09:17 AM Err, I was referring to the English version of Wiktionary, not the Chinese one. The Chinese one is full of errors, I wouldn't recommend it at all. But the English edition is very comprehensive as an English-English dictionary. And its coverage of Chinese, although not perfect, is pretty good actually. You may wish to try looking up more common and practical words in the English edition and see what you can find. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.