Looking for Spoken Word Frequency List

January 29, 2008 at 07:14 AM

oh wow, Chinese speech/voice recognition software didn't even come to mind.

What software do you have? Expensive?

February 9, 2008 at 09:36 AM

The standard for voice recognition is Dragon NaturallySpeaking. Yes it is expensive.

Anyway, to answer your question, I don't recall ever seeing a frequency list of Chinese words that wasn't based on a written corpus - and usually a written news / formal corpus at that. I also note that all sources you quote are from the 1970s or at best 1980s - bet you 'comrade' and 'revolution' feature highly.

And that is the precise purpose behind the text message database. Hardly formal and the only writing where you're likely to find people add in modal particles and do things in a strictly kou3yu3 basis.

The ultimate goal of the database is to provide a close approximation of a lexicon based on spoken word frequencies.

Follow it.

February 9, 2008 at 10:09 PM

That is a wonderful idea! How about adding in chat histories as well? I'd think the corpus would be built up exponentially faster.

February 10, 2008 at 03:55 AM

Chat histories is another project. Also chat messages tend to be less coherent and not necessarily as "kouyu" ish.

Sign In

Looking for Spoken Word Frequency List

Recommended Posts

LaoZhang

self-taught-mba

LaoZhang

self-taught-mba

Join the conversation