valikor Posted October 6, 2010 at 08:57 AM Report Posted October 6, 2010 at 08:57 AM Hi Folks Recently I've been studying some characters from the zein character frequency list (found here http://www.zein.se/patrick/3000char.html ) I have been surprised to find that some of the characters which are supposedly quite common actually seem to be somewhat rare/not useful. Has anyone else noticed this? Some examples: 赫 凯 茫 乾 翠 鹏 桑 All of these are among the top-2000 most common characters, according to the list. But in some cases, after consulting a dictionary and asking chinese people "What are some useful/common words in which you might use this character?", all I got was an obsure word or chengyu. I know some characters are mainly used in surnames... Cross-referencing with another frequency list, I found similiar results (ie, if a word was number 1800 on one list, it might be 2200 on the other... but fairly consistent). Any thoughts? Is the analyzed material scurring the results towards words not used in daily life? Or...? Thanks David 1 Quote
tooironic Posted October 6, 2010 at 11:10 AM Report Posted October 6, 2010 at 11:10 AM I would say all of those are relatively common, except perhaps for 翠 which I had to look up. Apparently it appears most frequently as 翠鸟 cuìniǎo kingfisher. The rest of these characters are either common in Chinese names or actually make common words by themselves, e.g. 茫茫 mángmáng boundless and indistinct, 乾[干] gān dry, etc... But then again definitions of what is considered "common" differs. 1 Quote
renzhe Posted October 6, 2010 at 11:24 AM Report Posted October 6, 2010 at 11:24 AM 翠 is also used in names. Off the top of my head, one of the maids in Ba Jin's "Family" had in her name, as did the female lead in 潜伏, a popular recent TV series. 乾 does not mean dry in simplified texts (like tooironic writes, it is simplified to 干), but it shows up in names as qian2. One of the most famous Chinese Emperors Qianlong used this character, which is enough to make it very common in written text (Qianlong is one of the few Qing emperors held in high esteem by the Chinese today, the other one is Kangxi). In fact, most of the characters you listed are common in names. This is exactly the problem with frequency lists -- you get things out of context. You will also run into archaic characters that are only used in chengyu (but the chengyu are very common), and which can't be used alone, things only used for transcribing strange sounds, and things like that. But I still find frequency lists useful. Just don't assume that something is really important just because it shows up in such a list. If the characters seem odd and rare, then you probably don't need to learn it yet. Actually, the frequency lists are the most useful as a tool to check from time to time how far you've got and whether you are missing some important characters. Not to base your studying on, the HSK vocab lists are much better for this. 3 Quote
aristotle1990 Posted October 6, 2010 at 01:14 PM Report Posted October 6, 2010 at 01:14 PM 翠 also appears in common chengyu like 翠色欲流 and 翡翠鲜笋煲. And don't forget 翠绿, which is an HSK word. 1 Quote
gato Posted October 6, 2010 at 04:07 PM Report Posted October 6, 2010 at 04:07 PM 翡翠 (a type of jade) is a very common word. 1 Quote
gegehuhu Posted October 6, 2010 at 05:23 PM Report Posted October 6, 2010 at 05:23 PM 翠湖, or Green Lake, is a famous lake and park in the middle of Kunming. 1 Quote
jbradfor Posted October 6, 2010 at 05:29 PM Report Posted October 6, 2010 at 05:29 PM It's not quite what you're asking, but currently I feel that learning based on character frequency lists to be less-than-optimal after, say, 500-700 characters. First is for all the reasons mentioned above. Second, being able to read Chinese requires knowing words, not characters. While obviously there is some relationship between the two, they are not the same, and for actual studying I would recommend focusing on words, not characters. 4 Quote
New Members 万里长城 Posted October 7, 2010 at 01:01 AM New Members Report Posted October 7, 2010 at 01:01 AM 桑 also appears in common words like 桑树 and 桑葚. And don't forget 桑麻, which is an HSK word. 1 Quote
c_redman Posted October 7, 2010 at 03:15 PM Report Posted October 7, 2010 at 03:15 PM There is a linguistic measure called dispersion, which is a scale from 0 to 1 of how evenly spaced a word is. It sounds like you are looking for a list that modifies the frequencies or at least notes these overrepresented characters. Depending on your learning goal, a character that shows up 1 time in each of 20 different texts may be more useful to know than one that shows up 20 times in one text and nowhere else. 翠绿 (already mentioned above) is a word used frequently in 哈利波特与魔法石. It seems nothing is ever just plain green in that book. 2 Quote
valikor Posted October 13, 2010 at 02:16 PM Author Report Posted October 13, 2010 at 02:16 PM Thanks everyone for your helpful replies. To add one more note: Within one week of learning all of these characters (which I initially thought of as obscure and non-useful, since I couldn't find many good words in my dictionary that include them), I have encountered all of them except 桑. I watched a movie where 凯文 was "Kevin", saw 赫 at least once (I forget where), saw 鹏 in the name of some kind of soap in a TV commercial, saw 茫茫 used, and saw 乾坤 used in some bad chinese TV show. As some of you said--maybe they're not all that uncommon after all! Quote
c_redman Posted October 16, 2010 at 03:23 PM Report Posted October 16, 2010 at 03:23 PM I find that happens a lot in language. It's possibly a kind of Baader-Meinhof Phenomenon, or an inflated sense of importance to coincidences. But I think it's more likely perceptual vigilance or selective attention. I can ignore an unknown word for years until I finally have the spare brain cells to learn it. Then, once I learn it, I am suddenly aware of seeing it other places, especially since it was recently learned and thus fresh in my memory (the recency effect). I didn't learn the word 生活 for 2 1/2 years. How could I not have noticed it? Obviously, once I learned it, I saw it everywhere. 1 Quote
New Members LongShiKong Posted March 23, 2012 at 03:04 AM New Members Report Posted March 23, 2012 at 03:04 AM I'm not the first to point out that character frequency lists are unreliable. I've come across 3 or 4 that many commonly used characters. If the sources such lists are derived from are not carefully chosen to reflect variation, they won't be representative. An example is Jun Da's 9,933 character list. Dozens of the the characters I've studied are not included in his rather exhaustive list. Quote
WestTexas Posted March 23, 2012 at 04:45 AM Report Posted March 23, 2012 at 04:45 AM I feel like I never see 茫 other than in my Anki deck. I do wonder if it gets counted twice in a frequency list when it shows up in the word 茫茫. Perhaps that's why it's rated as being more common than, in my experience, it really is? Also 赫 is for Hertz, that might be one of those words which shows up many times in a small number of texts, giving an inflated impression of its importance. Quote
heifeng Posted March 23, 2012 at 06:35 AM Report Posted March 23, 2012 at 06:35 AM I'm guessing you guys don't often feel 迷茫 or 茫然不解 if you don't think 茫 is very commonly used hehe j/k. Quote
imron Posted March 23, 2012 at 07:01 AM Report Posted March 23, 2012 at 07:01 AM Both of these show up quite regularly in novels. 赫 not for Hertz, but for example in things like 赫然. Dozens of the the characters I've studied are not included in his rather exhaustive list. Dozens out of almost 10,000 is not a bad ratio. It's not meant to be an exhaustive list though, and will only be as accurate/good as the corpus it was based on. 1 Quote
Raphanid Posted August 5, 2012 at 09:32 AM Report Posted August 5, 2012 at 09:32 AM 翠华 is a popular chain restaurant in Hong Kong. 桑树 are important in silk production, which has cultural resonance in China. 凯里 is a fairly large city in 贵州. Quote
li3wei1 Posted August 5, 2012 at 03:10 PM Report Posted August 5, 2012 at 03:10 PM The question 'how accurate is ...' only makes sense if there is something to compare it to that you know is accurate. As the total corpus of written (let alone spoken) Chinese is immeasurably vast and expanding all the time, all we can do is look at slivers of it taken at certain times. The bigger the sliver, the more up-to-date it is, and the more closely the selection of text matches what you are likely to encounter, the more accurate the resulting list will seem. No two lists will be the same, and the only way you can criticise the accuracy of a list is to compile your own, from a bigger, more recent, and more widely-ranging corpus. Then someone else will come along and claim that yours is not accurate. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.