KiraKira Posted March 2, 2007 at 07:15 PM Report Posted March 2, 2007 at 07:15 PM I've heard so many different claims on various blogs and forum searches regarding the number of hanzi and usage percentages that I'm not sure which is correct. For example claims like: "100 characters cover about 50% of the used language, and 500 characters cover about 90%." .. or variation of that statistic. Does anyone have a source that has a pretty good arguement/analysis used to back up the claim? I know there is some correlation (and it would be nice if it were close to the one posted) but I just want to be sure & see what everyone thinks the closest correct claim is. Quote
hanyu_xuesheng Posted March 2, 2007 at 08:50 PM Report Posted March 2, 2007 at 08:50 PM See http://technology.chtsai.org/charfreq/93charfreq.html You understand knowing 100 char. 45 % 300 char. 67 %, 500 char. 77 %, 1000 char. 89 % 3000 char. 99 % of a text. Quote
flameproof Posted March 2, 2007 at 11:37 PM Report Posted March 2, 2007 at 11:37 PM You can check it yourself: http://goulnik.com/chinese/gb/ I just want to ... see what everyone thinks the closest correct claim is. There is none. Because it depends all on the text you use. Some are more, some are less..... So there is not THE list... Quote
imron Posted March 2, 2007 at 11:39 PM Report Posted March 2, 2007 at 11:39 PM Just remember though that this is for characters only, so for non-native speakers who know 3000+ characters, there will still be many words that are unfamiliar. Most usage statistics like this are really only valid for native speakers and not for language learners. Quote
gato Posted March 3, 2007 at 04:25 AM Report Posted March 3, 2007 at 04:25 AM It has some similarity to the claim that just 26 letters covers 100% of English texts. Quote
flameproof Posted March 3, 2007 at 05:50 AM Report Posted March 3, 2007 at 05:50 AM Just remember though that this is for characters only, so for non-native speakers who know 3000+ characters, there will still be many words that are unfamiliar. This is very true, however, to know a (multi character) word you need to know the individual character first. So knowing all characters is for sure a good start. Next step is to know all words.... But that's not it. Even if you know all words you will still often have difficulties to figure out the meaning. It has some similarity to the claim that just 26 letters covers 100% of English texts. I think it makes perfect sense to learn characters according to frequency. They are not avoidable anyway, and it's very motivational since you can very fast recognize lots of passages. So frequency analysis is a very useful toy for sparetime fun use, not more, not less. It's specially useful to analyze online text that you plan to read to figure out if it's suited to your level. With 500 characters you can "see" 90% of the text, but the other 90% are in the remaining 10%... Quote
imron Posted March 3, 2007 at 12:12 PM Report Posted March 3, 2007 at 12:12 PM I don't think there's anything wrong with learning characters by frequency, but just don't be under the impression that once you get to 3000 you'll be able to read and understand 99% of all texts. Quote
atitarev Posted March 3, 2007 at 12:49 PM Report Posted March 3, 2007 at 12:49 PM I am in a conversational class now, what I noticed that many common words used in speech (not so much in the written language) may not have the same frequency. In other words, all frequency ratings are based on newspapers, formal texts, not on what you hear more often in the street (I am talking about standard Mandarin vocabulary). Just my 2 cents, thought it was mentioning. Lists of characters by frequency are only useful for some reviewing, not for actual studies, anyway. Just keep reading texts. Individual frequency lists may differ largely. I haven't seen detailed analysis of Chinese frequency lists but I've seen descriptions how Japanese ones were made, which newpapers were used, over which period, etc. If you do a simple search by character in google, it gives you a number of hits. Quote
flameproof Posted March 3, 2007 at 01:08 PM Report Posted March 3, 2007 at 01:08 PM Individual frequency lists may differ largely. That is very true. But #1 is mostly 的. In the Top ten are usually 一,了,们,在,是... I don't use it really for learning, more for Chinese related fun. But also to check before I read something longish how many different characters are there. Quote
KiraKira Posted March 3, 2007 at 07:55 PM Author Report Posted March 3, 2007 at 07:55 PM Thanks, I don't plan on using it like a official benchmark but its nice for perspective. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.