Jump to content
Chinese-Forums
  • Sign Up

New database with frequency of words


mandarinboy

Recommended Posts

If anyone is interested in it i am now parsing around 1.000.000 words a day from Chinese newspapers of different sorts to build a database with the most requently used words in modern Chinese newspapers. I will put it on my site http://www.mandarinteacher.com as soon as i am back from India.

The intention of the database is to help students learn commonly used words in todays newspapers. Later on i will also calculate texts with word density to make it possible to find texts that match the level of words a student do know.

What i have found is that the speach paterns varries widely betwwen for example peoples daily and 163.com. Maybe not so surprisingly but interesting.

Since the data i am using are from Cedict and other sources the database will be free of cruse but licensed under the same conditions as the Cedict data are today.

Link to comment
Share on other sites

We are in fact useing wordslists without translations when parsing. This so that we get all words. What we are using cedict and adso for is to later on create lists for stundents with pinyin and translations. After all, the most frequently used words are in those lists so it is nice combinations.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...