LaoZhang Posted January 29, 2008 at 07:14 AM Author Report Posted January 29, 2008 at 07:14 AM oh wow, Chinese speech/voice recognition software didn't even come to mind. What software do you have? Expensive? Quote
self-taught-mba Posted February 9, 2008 at 09:36 AM Report Posted February 9, 2008 at 09:36 AM The standard for voice recognition is Dragon NaturallySpeaking. Yes it is expensive. Anyway, to answer your question, I don't recall ever seeing a frequency list of Chinese words that wasn't based on a written corpus - and usually a written news / formal corpus at that. I also note that all sources you quote are from the 1970s or at best 1980s - bet you 'comrade' and 'revolution' feature highly. And that is the precise purpose behind the text message database. Hardly formal and the only writing where you're likely to find people add in modal particles and do things in a strictly kou3yu3 basis. The ultimate goal of the database is to provide a close approximation of a lexicon based on spoken word frequencies. Follow it. Quote
LaoZhang Posted February 9, 2008 at 10:09 PM Author Report Posted February 9, 2008 at 10:09 PM That is a wonderful idea! How about adding in chat histories as well? I'd think the corpus would be built up exponentially faster. Quote
self-taught-mba Posted February 10, 2008 at 03:55 AM Report Posted February 10, 2008 at 03:55 AM Chat histories is another project. Also chat messages tend to be less coherent and not necessarily as "kouyu" ish. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.