Popular Post mikelove Posted September 29, 2015 at 03:38 PM Popular Post Report Posted September 29, 2015 at 03:38 PM We just launched a beta of one of our in-house Cantonese projects, an open-source Cantonese-English dictionary we're calling CC-Canto. It has about 22,000 entries and is available right now as a free add-on in the "Add-ons" screen in our iOS and Android apps, and we also whipped up a basic little website for it with a search feature + links to download the source data files: cantonese.org It's not quite finished yet - hence the 'beta' tag - but after 6 months we now feel it's in a state where other people might start to find it useful. It's designed to be used in conjunction with CC-CEDICT - basically, we added Cantonese readings to CC-CEDICT and then only created entries for Cantonese words that a) weren't in CC-CEDICT or b) had significantly different definitions than those in CC-CEDICT. The idea being that since Cantonese and Mandarin share much / most of their vocabulary, there's no sense in duplicating effort writing two separate entries for 習近平 when we can simply point to the existing one in CC-CEDICT. (the CC-CEDICT Cantonese readings are also open-source, available on the same website and already incorporated into CC-CEDICT in the latest versions of Pleco) We used paid editors for the initial database, and still have them going through rechecking entries, but we're hoping the community might help with feedback - corrections, missing words, etc - as you already do so generously with our dictionary apps. Of course we'd also be delighted to get some volunteer editors, but we recognize that may take a while. Aside from correcting and adding entries the biggest projects are doing a better job of tagging whether or not specific CC-CEDICT words are used in Cantonese and imposing a more consistent / clear English editorial voice (entries were mostly written by native Cantonese speakers). We're also still working on finishing some single-character entries (which are a real bear since Cantonese has a lot of 多音字 that don't line up with Mandarin 多音字). This is not an official CC-CEDICT project and had no involvement from the CC-CEDICT team, but they've indicated they have no problem with our doing it, and we hope we might eventually be able to manage some website integration with them for syncing updates to entries and such. It is distributed under the same CC-BY-SA 3.0 license as CC-CEDICT, and we've also stayed pretty close to their format in our data files, so it should be easy to use in your own projects if you're interested in doing so. The website is as I said basic but it was actually kind of fun to write (did it myself in PHP in about a day) and we might well be expanding on it in the future :-) 6 Quote
EricJMa Posted September 29, 2015 at 11:43 PM Report Posted September 29, 2015 at 11:43 PM Wow! This is awesome! I study Mandarin full time and Cantonese part time since I lived in Guangzhou for five years. Pleco just keeps on getting better and better. Keep up the good work! Thank you so much for all your hard work. Quote
querido Posted September 30, 2015 at 12:03 AM Report Posted September 30, 2015 at 12:03 AM I study only Cantonese now and I'm very happy with Pleco's interest in this. Quote
mikelove Posted September 30, 2015 at 04:02 AM Author Report Posted September 30, 2015 at 04:02 AM Thanks! Been interested in it for quite a while, but the lack of good material is a challenge - we're hoping we can eventually get this dictionary to a place where it can enable lots of other interesting Cantonese apps / websites / etc too (as CC-CEDICT has). Lots of other Cantonese stuff in the works, including the Cantonese expansion of our (more thoroughly edited) PLC dictionary which in spite of starting long before we began working on CC-Canto is still not quite finished :-) Quote
Demonic_Duck Posted October 10, 2015 at 06:25 AM Report Posted October 10, 2015 at 06:25 AM Looks pretty sweet so far! It seems like your transcribers are following different standards of transcription for the same meanings of the same characters, though: 呢鋪〔-铺〕 li1 pou1 呢次 nei1 ci3 呢便 ni1 bin6 呢陣時〔-阵时〕 nei4 zan6 si4 Quote
mikelove Posted October 31, 2015 at 12:35 PM Author Report Posted October 31, 2015 at 12:35 PM Thanks! Sorry I didn't see this post before now. (had a baby a few weeks ago so I've been out of commission somewhat lately :-) nei1 and ni1 are I believe used interchangeably, so with those at at least we just need to pick a system and stick with it. li4 is out of left field, though, and nei4 IIRC is only used in the very narrow sense of 'woolen cloth', so those are probably both wrong - it seems like they both came from the same (problematic-in-other-areas also) editor, and as it happens we're already going through and rechecking most of her work, so should hopefully be caught as part of that. Quote
Wahed Posted June 4, 2016 at 03:07 PM Report Posted June 4, 2016 at 03:07 PM I already have Pleco and added the CC-Canto addon after trying the site but the addon is not the same as the website. For instance, 'yatchaih' comes up in the website but not in the app. Quote
mikelove Posted June 5, 2016 at 02:16 AM Author Report Posted June 5, 2016 at 02:16 AM Did you turn on Yale romanization support in Pleco? While both default to Jyutping, the website falls back on Yale automatically if it can't find a Jyutping (or Pinyin) match, while our app doesn't. Settings / Languages / Cantonese / Phonetic System. Quote
carlo Posted June 5, 2016 at 02:26 AM Report Posted June 5, 2016 at 02:26 AM Hi Mike, have you seen this site and might there be an opportunity to collaborate at some point? They have been adding lots of examples of idiomatic usage and HK slang, and it could be a nice addition to what you already have. Quote
mikelove Posted June 5, 2016 at 02:52 AM Author Report Posted June 5, 2016 at 02:52 AM I have seen it, but they don't appear to have adopted an open-source license like we have, so collaboration would be tricky. 1 Quote
Wahed Posted June 5, 2016 at 03:55 AM Report Posted June 5, 2016 at 03:55 AM Thank you, Mike, that worked. Another question, though. I see the other two or three Cantonese dictionaries there, any main driving points to buy them? I'm studying Cantonese for the summer here in HK and need an offline Cantonese dictionary. The Cantonese sheik dictionary combined with your dictionary seem to work well together but the sheik dictionary isn't available offline though from what I can see. I didn't know if the other paid dictionaries would be similar or even much better than the sheik one...? Quote
mikelove Posted June 5, 2016 at 02:02 PM Author Report Posted June 5, 2016 at 02:02 PM Two of them are Cantonese-to-Mandarin, so they'd only be useful if your Mandarin was at a sufficiently high level. They're quite good as Cantonese-to-Mandarin dictionaries go, though, and include lots of example sentences. The third one - The Right Word in Cantonese - is probably the most useful, as it's English-to-Cantonese; it's small and short, but it gives you a good idiomatic translation for lots of common English words, and so goes beyond what you can get from a reverse search of a Cantonese-to-English dictionary like CC-Canto or CantoDict. We're waiting on the completion of a commercial Cantonese-to-English dictionary that we hope to license when it's ready, but no release date on that yet. We also have a pretty big update to CC-Canto coming soon, finally adding well-checked single character entries among other things. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.