kdavid Posted February 4, 2007 at 06:05 AM Report Posted February 4, 2007 at 06:05 AM Seeing how a number of characters seem to be used a number of times in different, and not necessarily related, words, how many actual characters do you think would be needed to write, say, the 4,000 most common words used in daily life. And then some, what about 10,000 words? Quote
roddy Posted February 4, 2007 at 06:30 AM Report Posted February 4, 2007 at 06:30 AM If you're willing to take the HSK vocab lists as an acceptable take on 'most common words'. you can come up with this fairly easily: Level 1: 1033 words, 798 characters Level 2: 2018 words, 808 characters Level 3: 2202 words, 598 characters Level 4: 3569 words, 670 characters So, rounding off to friendly numbers Your first 1000 words need 800 characters Your first 3000 words need 1500 characters Your first 5000 words need 2100 characters Your first 8800 words need 2800 characters The main character learning 'push' would therefore be at the start, assuming of course you are learning to write. At Level 1 you have to learn 800 characters and you get only 1000 words in return. But by Level Four you get over three times as many words for less additional characters - although tragically you still have to actually learn the words, you can't just learn the characters and wait for them to pop into your head. Quote
kdavid Posted February 4, 2007 at 10:52 AM Author Report Posted February 4, 2007 at 10:52 AM That's exactly what I was looking for. Those numbers look like Christmas morning. Now I don't feel so hopeless. Thanks, Roddy! Quote
Shadowdh Posted February 4, 2007 at 11:52 AM Report Posted February 4, 2007 at 11:52 AM Yep me too.. thanks Roddy, this is quite useful info.. I know what you mean about the words though... why is it that when two characters are combined they can come to mean something completely different to what you thought... sigh... Quote
HashiriKata Posted February 4, 2007 at 12:41 PM Report Posted February 4, 2007 at 12:41 PM why is it that when two characters are combined they can come to mean something completely different to what you thought... sigh... There're 2 possible answers to this: 1. To keep the number of characters down, out of pity for us learners. 2. To frustrate foreigners who try to master the language. Quote
Guest mamba9 Posted February 4, 2007 at 12:52 PM Report Posted February 4, 2007 at 12:52 PM sounds exactly like chemistry to me. Remove an oxygen and all of a sudden it turns to poison lol. Quote
roddy Posted February 4, 2007 at 01:53 PM Report Posted February 4, 2007 at 01:53 PM This is a related tool - you can plug in the characters you already know / are learning / dream vaguely of one day being somewhat familiar with, and it will output the words that those characters will allow you to write. It can be kind of encouraging to see sometimes how very simple characters you might learn early on in a writing course - say 中,立,天,文 - can combine to produce less common bits of vocab like 中立 and 天文. I'd imagine there's something out there that works in reverse - plug in the vocab you already know and get a list of the characters you'll need to learn how to write - but I don't know specifically where. Quote
Hero Doug Posted February 4, 2007 at 02:43 PM Report Posted February 4, 2007 at 02:43 PM Yeah I'll second that, it is a very promising list. 3000 characters for 10 000 words is a really nice ratio. Quote
roddy Posted February 4, 2007 at 02:56 PM Report Posted February 4, 2007 at 02:56 PM Probably also worth noting that as the words are restricted to those on the HSK lists, you'd also find that those characters give you 'extra' words。 笔记本, for example, isn't on the lists, but you'd be able to write it with characters you'd learn at first level. Quote
Koneko Posted February 5, 2007 at 03:54 PM Report Posted February 5, 2007 at 03:54 PM why is it that when two characters are combined they can come to mean something completely different to what you thought I cannot really answer your question here but I think you can discover more through 语素. 语素 is like a "proper" Chinese intermediate grammar once you have mastered most Chinese basic grammar. A good knowledge of 语素 will enable you to tell which characters can be combined and used as a pair, 3-character-word, etc. You will also gain a deeper understanding of the uniqueness of Chinese characters. K. Quote
Learner Posted February 5, 2007 at 04:32 PM Report Posted February 5, 2007 at 04:32 PM The following excerpt from the Clavis Sinica FAQ (http://www.clavisinica.com/fs-info.html) also contains relevant information that allows us to continue Roddy's list (approximate word-to-character ratio of 1-1 for the first 1000 words, 2-1 for the first 3000 words, 2.5-1 for the first 5000 words, 3-1 for the first 9000 words) by deriving an approximate word-to-character ratio of 6-1 for the first 25000 words. Learner # How large is the program's dictionary? The dictionary contains over 25,000 separate entries, including approximately 4,000 characters and over 21,000 multi-character compound words, phrases, and idioms, or chengyu. All of the entries are fully searchable in both English and Chinese. # I've heard that written Chinese has tens of thousands of characters. How can a dictionary of 4,000 characters be of much use? It is true that the great Kang Xi dictionary of 1716 listed nearly 50,000 characters, but this number included many variant and obsolete forms. The number of characters to be found in modern Chinese texts is probably much closer to 10,000, and of these, more than half are used only rarely. The 4,000 characters included in the Clavis Sinica dictionary account for approximately 98% of the characters to be found in a typical modern newspaper, and 100% of the characters found in any of the most commonly used college-level Chinese textbooks. # On what basis were these 4,000 characters selected? The Clavis Sinica dictionary is based on the first level of the Guo Biao Chinese character set, which is the accepted standard in the PRC. The 3,754 characters in this set represent the most commonly used characters in the modern written language. Clavis Sinica supplements these with an additional 250 of the more frequently seen characters from the second level of the Guo Biao character set. Quote
Sgt_Strider Posted March 23, 2008 at 11:28 PM Report Posted March 23, 2008 at 11:28 PM Roddy and or anyone else here on this forum, Can you guys tell me where I can obtain a list of HSK word list up to level 4? I spent three years loosely learning Mandarin, but I have studied Cantonese since I was a little kid. I say I know roughly 800-1000 characters (not sure how many words) and I want to intensify my learning. I'm definitely a lot better at reading and recognizing a character than writing it. If there is a list, then I can focus more on what I ought to know and just follow up on it. Quote
roddy Posted March 23, 2008 at 11:37 PM Report Posted March 23, 2008 at 11:37 PM chinese-forums.com/vocabulary is one place, but it's not really maintained any more. Still usable though. There's also HSKFlashcards.com. Quote
muyongshi Posted March 23, 2008 at 11:39 PM Report Posted March 23, 2008 at 11:39 PM Try searching please... An online one: http://www.chinese-forums.com/index.php?/topic/15-cctv-learn-chinese39 A few downloadable ones: http://www.chinese-forums.com/index.php?/topic/2-favourite-chinese-musician0343&page=2 Quote
tooironic Posted March 27, 2008 at 03:06 PM Report Posted March 27, 2008 at 03:06 PM Back to the topic, I've never understood why some learners places such a high importance on character frequency lists. As if they plan to learn the first 4,000 or however many characters in isolation and hope that will cover them? Surely they would be better off learning vocabularies in context? Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.