Jump to content
Chinese-Forums
  • Sign Up

Open Source Chinese Dictionaries


Recommended Posts

  • 2 weeks later...
Posted

ChouDoufu,

Let me wade into this fray.... from my perspective there isn't really that much incentive for people to contribute to open source projects if all they want is a definition. Commercial dictionaries are available in pirated form on Stardict. Open source projects also tend to get out-PageRanked by sites which use their data and focus on presentation: I'm not sure how many people are aware that Dict.cn and Kingsoft both make use of LDC data without attribution (hint: search for "Flathead").

The CEDICT license somewhat encourages this: it was written in the age of paper dictionaries and packaged desktop software and so precluded the obvious commercial uses of its time: it wasn't written to deal with AdSense. This problem is compounded by the preference open source developers have for writing closed interfaces that solve display problems rather than working on backend data entry. If every CEDICT port provided an easy way for users to ping content back to Paul or Denis, the project would be growing much faster. If Chinese PeraKun or PlecoDict made it easy for people to add content to Adso, we would be growing faster. But it is unreasonable to expect this from developers who are already working free-of-charge.

At some point, traditional printed dictionaries will be surpassed by their open source counterparts. But someone needs to do the data-entry and there is a catch-22 that makes this very difficult. I've tried to find a nice balance by making Adso is free for non-commercial use, but requiring permission for commercial use. I think this has been a relatively successful strategy: at a minimum we've been able to release data that we simply wouldn't have had access to otherwise. Tagging user entries is also useful -- that way there's no problem changing licenses in the future.

If you're looking to start ANOTHER open source project, I'd be pleased to help out as I can. The big problem from my perspective (with Adso) is that the grammar entries need to be distinct from the definitional data and machine-parsable to be useful for the tools (annotators and translators) that I think will increasingly be the source of contributions for open linguistic data compilations. The social value here is in working to build the tools to ensure an open (bilingual) semantic web, not really in creating a data list. I'd encourage you to think about either following the format we've got with Adso so that data is interchangeable, or drafting a licence you'd like the project to support, so that we can add support for it on the backend.

  • 4 weeks later...
Posted

Good news on the Open Source Chinese Dictionaries front:

CEDict has officially been released from copyright by it's creator, Paul Denisowski. He contacted Jim Breen and myself. A copy of the message is available at the Chinese Dictionaries Google Groups. That was the biggest obstacle preventing me from contributing. I hope the community can come together now and support this project.

  • 11 months later...
Posted (edited)

I had a look at the most recent CEDICT dictionary and was very impressed by the revisions and accuracy of translations for simplified/traditional Chinese. I was so impressed that I decided to use this as my primary dictionary for my main pop-up translator software (Kingsoft Powerword 2007). So after 4 days of formatting, I got it to work with Powerword and now I'm very very happy.

I can now get simplified/traditional translations with pinyin (tone marks) with the greatest of ease and accuracy. Combined with my other small word lists/dictionaries, I decided to disable the default "Concise Chinese-English Dictionary" because it just isn't needed anymore. I suggest anyone who wants a great Chinese-English database to look into the most recent version of CEDICT.

Edited by ABCinChina

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...