Jump to content
Chinese-Forums
  • Sign Up

JyutDict: Open Source Cantonese Dictionary in CEDict format


gong mo tai

Recommended Posts

Hi All,

I am very pleased to let you all know that there is now a free open-source Cantonese dictionary in CEDict format. It is

called JyutDict and has just today been published by www.ZhongWenLearner.com. The dictionary can be searched

online on this website by browsing to the Word Dictionary and selecting "Jyutping" as the phonetic system. The

site's Input Method Editor is also powered by JyutDict, making it the best online IME for jyutping there is, period.

However, the best part about JyutDict is that you can download it and use it in almost every (if not every) program that can

use a regular CEDict file.

Enjoy!

  • Like 2
Link to comment
Share on other sites

Hello Daan,

Thanks for the comment and very good point about the license. I somehow missed putting the license info into the dictionary file. The data (apart from the Jyutping) came from CC-CEDict and should therefore be released under the same license (Creative Commons Attribution-Share Alike 3.0). JyutDict will be re-published as soon as possible with this license info at the begining of the file.

Regards

Link to comment
Share on other sites

The term "Mandarin dictionary with Cantonese pronunication" is not accurate. CEDict lists both the Simplified and Traditional versions of a word. The only thing that makes it "Mandarin" is the pinyin phonetics. JyutDict takes the CEDict dictionary and uses the Traditional values to produce jyutping cantonese phonetics. This project was put together over several months between myself and the guys from eGuideDog/Ekho project (http://www.eguidedog.net/ekho.php), namely Cameron Wong and Silas Brown. They've done a very good job with this and we are now working on a way of letting the community actually add and edit entries... much like CEDict. Because the dictionary was put together with code, it means we could translate all entries from CEDict (over 100,000!)...but we know this also means there may be some errors. However, for the most part we are very pleased with it and any errors found can later be fixed by the aforementioned editing tool that will be in place soon enough. Hope you enjoy. :-)

Link to comment
Share on other sites

I'm not really sure what are you implying by bringing up the traditional characters. Both Mandarin and Cantonese can be written in both simplified and traditional.

Are you aware that Cantonese has a huge bulk of its own vocabulary? Even starting with the basic vocabulary, e.g. the verb 'to eat' is 食, not 吃, 'to walk' is 行, not 走, just like in Classical Chinese.

  • Like 1
Link to comment
Share on other sites

OP, I appreciate you effort on developing dictionaries with Cantonese pronunciations. But your "The only thing that makes it "Mandarin" is the pinyin phonetics." is not accurate. I actually agree with the two points raised by Iriya 1) many Cantonese speakers in the Mainland use the simplified script because the script does not matter; and 2) there are more differences between Cantonese and Mandarin than just pronunciations.

Your works require downloading so I haven't seen them. As a Cantonese speaker I find that it is not easy to find reliable resources about Cantonese pronunciations. Personally I use these, which I consider to be reliable -

1) 朗文中文高級新辭典 (a good Chinese dictionary with Cantonese pronunciations);

2) 粵語審音配詞字庫 hosted by the Chinese University of Hong Kong -> http://humanum.arts....Lexis/lexi-can/ (there is also an Android app that makes use of the data of this database. I find that app quite handy.); and

3) 中英對照香港學校中文學習基礎字詞 of the Education Bureau of Hong Kong -> http://www.edbchinese.hk/lexlist_en/ .

PS - Could admin please check if the Link function still works? It does not seem so.

Link to comment
Share on other sites

"Both Mandarin and Cantonese can be written in both simplified and traditional" - Iriya

- Exactly true, yes. I was simply informing you about how we generated the jyutping (by the Traditional chars). I suppose that is irrelevant though.

"Are you aware that Cantonese has a huge bulk of its own vocabulary" - Iriya

"there are more differences between Cantonese and Mandarin than just pronunciations." - Skylee

- I most certainly am aware of this. However, is it not agreeable that the current dictionary is a great start? Surely that can be seen and we can ignore the little issues at first release... :-)

We are working on an editing system right now so that the learning community can add and edit entries, meaning that over time the dictionary will evolve on its own, apart from CEDict altogether... so that it will be truly 100% Cantonese.

Link to comment
Share on other sites

  • 10 months later...

Hi All,

This post is long overdue.. the JyutDict editing system works (has been for quite some time actually) and we have an editor for it; Cameron Wong of eGuideDog/Ekho. In fact, the eGuideDog project's data is now generated from JyutDict. Much like the CEDict project, anyone can sign up and submit new entries or corrections to existing entries. So, for this to work, we need people who are interested in an open source cantonese dictionary to actually get involved and start adding and editing existing entries. So come on guys! :wink:

Currently, we have over 113,000 entries and Zhongwen Learner provides a jyutping IME based on it. The annotation tool also uses it to produce jyutping for the given chinese character input.

With your help, we can make this better... for all learners of Cantonese!

Regards

www.zhongwenlearner.com

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...