ilprincipe Posted September 4, 2009 at 10:17 AM Report Posted September 4, 2009 at 10:17 AM Dear fellow chinese students, there is lot of discussions on what are the top 500,1000, or 2000 most used characters. however, it is important to note that simply knowing those top characters will get us nowhere to be able to read newspapers or understand the language. The real useful thing is to know the meaning of compund meanings. I believe we all agree on this. I run some simple statistics and I see that the top 500 characters combined with one another can yield 7,000 meanings. the top 1000 about 17,000, the top 1500 about 25,000 and the top 2200 about 35,000. Many of them are rarely used, of course, so.... So the question for us is, instead of focusing only on the so-called most frequently used, how can we all find out the most frequently used 'compounds'? so that we can tailor our character learning to those compounds and actually be able to read newspapers etc..and not stumble on 明 and 白, without understanding that it means 'to understand' instead of 'clear' and 'white'? any comment/feedback is most appreaciated to make progress on this field. regards to all.. Quote
chalimac Posted September 4, 2009 at 10:29 AM Report Posted September 4, 2009 at 10:29 AM (edited) Here you go: http://lingua.mtsu.edu/chinese-computing/statistics/bigram/form.php Taken from: http://lingua.mtsu.edu/chinese-computing Edited September 4, 2009 at 10:45 AM by chalimac Quote
gato Posted September 4, 2009 at 10:40 AM Report Posted September 4, 2009 at 10:40 AM Try to HSK vocab list, which are the words you are expected to know for the various levels of the HSK. There are 8000 words in the four levels combined, I think. http://hskflashcards.com/download.php Some comments on the HSK vocab list: http://laowaichinese.net/hsk-vocabulary-levels-added-to-mdbg.htm Just because a word has a lower rating, doesn’t mean it’s more commonly used. Here are a few ways the HSK info isn’t useful (to those of us not preparing to take the test): 1. HSK rating has nothing to do with spoken/written or formal/informal frequencies. For example, in my experience, computer is spoken much more frequently as “diànnǎo” 电脑 but is formally referred to (like if your major is computers in collge) as “jìsuànjī” 计算机. Both of these words appear on HSK list 3. 2. The difference between 1 and 2 is negligible. Vocabulary lists 1 and 2 are both covered by the Basic (lowest) test, so a word may appear on list 2 simply because they ran out of room on list 1. For example, “yǎnjing” 眼睛 gets an HSK rating of 1, but yǎn 眼 by itself is 2. Surely you’d know the single character before learning the two of them together. The bottom line is: if a word has a HSK rating in the dictionary, it’s more likely to be a common word than one without a rating. Also, if I’ve got to choose between two synonyms (that really can be used interchangeably) I’m going to choose the one with the lower HSK number. Quote
imron Posted September 4, 2009 at 01:19 PM Report Posted September 4, 2009 at 01:19 PM Here you go:http://lingua.mtsu.edu/chinese-compu...igram/form.php Careful. Those are not words, they are bigrams. And as it says on that page: Note: A bigram may be a nonsense combination of characters.The bigram data only tells you the frequency of two characters appearing next to each other, not whether or not they are actually words (e.g. it could be the last character of one word followed by the first character of a different word). Quote
c_redman Posted September 4, 2009 at 08:32 PM Report Posted September 4, 2009 at 08:32 PM http://www.katica.org/cer28/projects/zhtoolkit/word-freqs/LCMC_3000_Word_Frequency.txt This is from the Lancaster corpus, per their word segmentation. Quote
student Posted September 4, 2009 at 10:03 PM Report Posted September 4, 2009 at 10:03 PM You might also want to look at "A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners", http://www.amazon.com/Frequency-Dictionary-Mandarin-Chinese-Vocabulary/dp/0415455863/ref=sr_1_1?ie=UTF8&s=books&qid=1252101710&sr=8-1 Quote
anonymoose Posted September 5, 2009 at 02:44 AM Report Posted September 5, 2009 at 02:44 AM I think learning from wordlists is unnecessary. Of course if you enoy doing so, then fine - I'm not trying to talk anyone out of it, but I've never seen the need for wordlists myself, mainly for the following reason: Different people have different interests and activities. A wordlist just contains general vocabulary that could be applicable to everyone, but not specific to anyone's personal habits. For example, if you want to be able to read a newspaper, sure some of the wordlist vocabulary may be useful, but there's probably going to be a lot of vocabulary from outside the wordlist. If you are interested in technology, and tend to read more of this kind of article, then you will need to learn more technology-related vocabulary. Likewise, if you are interested in sports, then you will need to learn more sports-related vocabulary. The best way to do this is just try to read articles you are interested in, and learn new vocabulary as and when you meet it. Needless to say, if you read enough, the so-called "frequently used words" will appear frequently, so you will end up learning them first anyway. Quote
imron Posted September 5, 2009 at 02:51 AM Report Posted September 5, 2009 at 02:51 AM I agree with this. In a worse case scenario where you don't understand anything, an article/text in Chinese is essentially just a wordlist arranged left to right, top to bottom without any spaces Quote
ilprincipe Posted September 5, 2009 at 04:37 AM Author Report Posted September 5, 2009 at 04:37 AM thanks for your replies, interesting stuff. I agree of course that learning word list is a topic-specific issue and I am not suggesting someone should learn chinese that way. But the same maybe can be said for the character/flashcard approach...they both are list and you may or may not encounter them in your daily situation. my question was aimed at trying to maximise the use from having learnt the top x-characters. For example: Given that I know the top n (be it 500, 1000, or whatever) characters, what are the most used (IN ORDER OF FREQUENCY) words/compounds that make use of only and only those n characters? and I know we go back to the issue of topic (sport, news, etc..)..but ... Quote
imron Posted September 5, 2009 at 05:20 AM Report Posted September 5, 2009 at 05:20 AM The link in post #5 does almost what you want. It shows the top 3000 words (including compound words), ordered by frequency (I'm not sure how that correlates to character frequency, but you can be sure that all the characters used also have a relatively high frequency). As for your other point, flashcards can easily be created using the new vocab you pick up from reading. Likewise, you can create your own revision word lists from those words. The basic principles behind character drilling and flashcarding are still applicable as the only thing that is different is the source of new words/characters. The thing is, there is generally very little word frequency information publicly available, compared to say character frequency information. This is why getting new vocabulary from articles works so well, because by nature of the process, these will be the most frequently appearing words in material that is of interest to you. After all, articles are essentially just word lists. Take for example the lead paragraph in a recent news article about the head of Google China resigning. 谷歌全球副总裁、大中华区总裁李开复将于今日正式辞职,在四年任期结束后最终选择离开。据可靠消息称李开复今后可能自主创业。至此,自2005年谷歌正式入华以来组建的创始团队已经悉数离开。 You could essentially just treat this as the word list: 谷歌 全球 副总裁 大中华区 总裁 李开复 将 于 今日 正式 辞职 在 四 年 任期 结束 后 最终 选择 离开 据 可靠 消息 称 李开复 今后 可能 自主 创业 至此 自 2005 年 谷歌 正式 入华 以来 组建 的 创始 团队 已经 悉数 离开 Which contains a whole bunch of highly relevant (and frequently occurring) words - assuming your interest is in say the Chinese business/technology sector - and is much more readily available than word frequency lists. Quote
renzhe Posted September 5, 2009 at 12:02 PM Report Posted September 5, 2009 at 12:02 PM Different people have different interests and activities. A wordlist just contains general vocabulary that could be applicable to everyone, but not specific to anyone's personal habits. I agree, but I think that learning vocabulary lists like that are mainly useful in the early stages, to get a basic vocabulary that appears everywhere, all the time. Memorizing the 5000 core words or so can make many written materials accessible to you. And this, in turn, can give you a lot of context, a lot of new words, etc. Above that basic level, of course you need to learn through exposure. Quote
anonymoose Posted September 5, 2009 at 05:09 PM Report Posted September 5, 2009 at 05:09 PM I think that learning vocabulary lists like that are mainly useful in the early stages, to get a basic vocabulary that appears everywhere, all the time. Memorizing the 5000 core words or so can make many written materials accessible to you. As I mentioned previously, I'm not trying to dissuade anyone from memorising vocabulary lists if that what's they wish to do, but personally, if I had a list of 5000 words to memorise during the early stages of learning a language, I think my enthusiasm for it would be killed off pretty quickly. Quote
ilprincipe Posted September 6, 2009 at 03:58 AM Author Report Posted September 6, 2009 at 03:58 AM points taken you cannot just memorise characters in sequence, but maybe it can be helpful to memorise compounds formed by characters you already know..so no new character learning, just new combinations.. I also think that most combination have a logic, so more than memorising them, one needs to take a quick look, see the reason behind it, and it will then be hard to forget. the links you all posted are very helpful, I was not aware of them, so thanks very much to all! Quote
Chris8080 Posted December 19, 2009 at 07:23 AM Report Posted December 19, 2009 at 07:23 AM If I found some recourses I'm really interested in and would like to start understanding these, how do I know what's a word and what not? I mean .. in the post #10 http://www.chinese-forums.com/showpost.php?p=200092&postcount=10 全球副总裁大中华区 how do I know the boundaries? 全球 副总裁 大中华区 My Lingoes doesn't recognize the words all the time. My current status: knowing around 200 characters (150 of the most used) and some words. Speaking is way better than reading. Thank you. Bye, Chris Quote
imron Posted December 21, 2009 at 02:21 AM Report Posted December 21, 2009 at 02:21 AM Practice :-) Or use software that is capable of splitting a sentence into words instead of characters, e.g. Wenlin, Adsotrans etc (Edit: or my own Chinese Text Analyser).It's worth noting however that software like this has its limitations and nothing is a substitute for having a good feel for the language and knowing where a sentence should be broken up. This feel usually comes after lots of reading/listening to native level materials. Quote
roddy Posted December 21, 2009 at 02:41 AM Report Posted December 21, 2009 at 02:41 AM how do I know the boundaries? That's where the 'learn' bit of learning Chinese comes in. There may well be a process of trial and error as you try and figure out if that is 全 球副总裁 大中 华区; or 全球 副总 裁大 中华区; or whatever, but you'll get there in the end. But at 200 characters, you're probably better off with material designed for learners. 1 Quote
Flickserve Posted August 6, 2015 at 04:18 AM Report Posted August 6, 2015 at 04:18 AM The thing is, there is generally very little word frequency information publicly available, compared to say character frequency information.the query of word frequency list popped into my head this morning. A quick search on Google brought this thread up practically at the top. I can imagine a list of most common 2, 3 or 4 character words being quite useful. Quote
imron Posted August 6, 2015 at 05:18 AM Report Posted August 6, 2015 at 05:18 AM I think my point still stands though, which is that as you increase in level frequency lists become less and less useful because the material they are gathered from may be vastly different from what you might be reading, and so what is marked as a 'frequent' word my actually be quite infrequent and vice versa. That was part of the motivation I had for creating Chinese Text Analyser, to make it easier to find frequency information of content you are reading. Quote
Flickserve Posted August 6, 2015 at 09:39 AM Report Posted August 6, 2015 at 09:39 AM I think my point still stands though, which is that as you increase in level frequency lists become less and less useful because the material they are gathered from may be vastly different from what you might be reading, and so what is marked as a 'frequent' word my actually be quite infrequent and vice versa. That was part of the motivation I had for creating Chinese Text Analyser, to make it easier to find frequency information of content you are reading. I would agree with that. Quote
tysond Posted August 6, 2015 at 09:50 AM Report Posted August 6, 2015 at 09:50 AM @imron I agree in general, but I do find having frequency lists based on popular media to be very useful.... for consuming popular media. It's just I don't have a very complete collection of texts of popular Chinese media... otherwise I could make a great big text file and CTA it. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.