yedafu Posted October 2, 2017 at 01:40 PM Report Posted October 2, 2017 at 01:40 PM i recently read I.Nation research on the vocabulary size needed for a learner of English in order to get to a 98% comprehension of native text (books, radio, newspaper) and i was wondering if anybody know of a research that have been done in the same manner but on the Chinese language? that is to say, what is the vocabulary size needed for a Chinese learner in order to get to a 98% comprehension of a Chinese native text. i know all the complexity of the subject and also that before there some debates here on the subject, but i looking for an empirical academic research (written in Chinese or English) that been done on the subject. will be glad if someone who know this kind of paper will let me know. thanks! Quote
WenLei-William Posted October 5, 2017 at 09:29 PM Report Posted October 5, 2017 at 09:29 PM Hello yedafu, Here is a research paper that talks about a recommended amount of vocabulary words that an L2 learner of Chinese should know to have a basic understanding of any given news article. The most interesting part is the result section, which shows the frequency distribution of characters throughout the author's samples. Surprisingly enough, about 98% of the texts are comprised of just a set of 2000 characters (this shows to be true in other research papers and popular speculations). The author argues that for basic comprehension an L2 learner of Chinese would have to know at least 20,000 frequently occurring words. We know that a large portion of Chinese words are disyllabic, and having to learn 20,000 words seems like a daunting task. However, the neat thing about Chinese is that if we understand the individual characters, we can often (or just sometimes) deduce the meaning (or make a very accurate guess) of a disyllabic Chinese word. This takes the load off of how much we need to actively study, because the meaning of a good portion of the new words we come across can be deduced through context and our knowledge of the word's individual characters. There are also some good references you could read listed at the bottom of the author's paper, such as Da Jun's paper. http://www.fb06.uni-mainz.de/chinesisch/Dateien/hanzirenzhi_papers_da.pdf 1 Quote
大块头 Posted October 6, 2017 at 12:06 AM Report Posted October 6, 2017 at 12:06 AM 2 hours ago, WenLei-William said: Here is a research paper that talks about a recommended amount of vocabulary words that an L2 learner of Chinese should know to have a basic understanding of any given news article. The link isn't working for me... Quote
yedafu Posted October 6, 2017 at 06:15 AM Author Report Posted October 6, 2017 at 06:15 AM Thanks so much! it exactly what i was looking for! the link is working i just downloaded the paper. thanks again Quote
WenLei-William Posted October 6, 2017 at 10:45 AM Report Posted October 6, 2017 at 10:45 AM No problem yedafu, For anyone who can't access the link, the paper is titled: "Reading news for Information: How much vocabulary a CFL learner should know" by Jun Da (or Da Jun) Quote
mtokudome Posted November 18, 2017 at 04:21 AM Report Posted November 18, 2017 at 04:21 AM Although not an academic article, this blog post also shows statistics about character/word frequency based on a large google ngram corpus: https://puroh.it/how-many-chinese-characters-and-words-are-in-use/ He shows 99% coverage is 1,572 characters and 12,054 words. Quote
Tomsima Posted November 18, 2017 at 04:01 PM Report Posted November 18, 2017 at 04:01 PM Interesting to look at the statistics. Amazed to see 洗腦 and 南轅北轍 in the list of least frequently occurring words, as I encounter the former pretty much every other day, and 南轅北轍 is used by CCTV news reporters on a fairly regular (seems like they have a set list of chengyu that get repeated on the evening news, perhaps for stylistic or rhetorical purposes). That being said, the dataset is pulled from books, not from speech, so it was always going to be skewed in some way. As is often attested to on the forums, even if 1572 characters covers 99% of commonly used characters, it is not going to get you close to 99% comprehension. I think the same is true for the estimate of 12,054 words. I would guess I know more than this amount of words, but I am far from being able to say I am capable of '99% comprehension'. I am a native English speaker and I can only know 99% of English words. Picking up an english book in an unfamiliar topic area will further reduce my comprehension by a few percent. In light of this, I think it's safe to say statistics along the lines of 'if you learn x you will reach y% comprehension while fun, are ultimately pretty meaningless. There are just way too many variables. Quote
mtokudome Posted November 18, 2017 at 09:43 PM Report Posted November 18, 2017 at 09:43 PM @Tomsima, I also thought it was interesting that 洗脑 was one of the least common words. I agree with you that it would be interesting to see stats based on a spoken dataset (maybe from TV shows and news). With a vocabulary around 12,000, I'm curious to know how often you come across new words you don't know. Or maybe they are words you studied before but you just need more exposure to them. Do you still work on increasing your vocabulary? Quote
Tomsima Posted November 19, 2017 at 02:48 PM Report Posted November 19, 2017 at 02:48 PM 16 hours ago, mtokudome said: With a vocabulary around 12,000, I'm curious to know how often you come across new words you don't know. Or maybe they are words you studied before but you just need more exposure to them. Do you still work on increasing your vocabulary? working on increasing vocabulary and coming across words you don't know are linked. I come across words I don't know every day, but perhaps they cant properly be called 'new words' because of the way compound words and context helps out. if I read a newspaper and I don't 'work' on my vocabulary, I can still understand pretty much everything, even if a character I've never seen before comes up in a compound with a character I am familiar with, plus the supporting context. But if I say to myself, what exactly does this mean? What are the deeper connotations? Is this rhetorical? Sarcastic? Playful? what is a good translation if I work back to English? can I use it? If I pay attention to these things and cant answer them, it goes in the list for studying. A good example is my 'todays words i don't know': 拖欠 應驗 繩之以法 光陰 the first three are completely guessable by context on the fly. but at the time of reading I couldn't immediately and confidently say that 拖欠 means 'be in arrears', so it went into my 'new' words list. the fourth word is totally new for me, as it is a new literary usage of 光 and 陰 both of which represent time in this combination. which I was pretty much oblivious to. Take that as you will - is this a list of four new words or just three words I kind of get, and one 'real' new word? Long story short. new words in every conversation. new words in every reading session. and that's why learning Chinese is fun, it never gets boring! 1 Quote
Publius Posted November 19, 2017 at 04:36 PM Report Posted November 19, 2017 at 04:36 PM I'd say 洗脑 is quite common in netspeak. For example we are periodically being attacked by 洗腦神曲s like Numa Numa or Gangnam Style. 光陰 is also a common word: 話說光陰似箭、日月如梭,XXX 學習漢語已有 N 個年頭。這一日 TA 來到…… (If you read Wuxia novels, you're guaranteed to meet sentences like this at least once.) There's an old saying 一寸光陰一寸金,寸金難買寸光陰, about the same meaning as 尺璧非寶,寸陰是競 from the Qian Zi Wen that you're reading. 1 Quote
mtokudome Posted November 20, 2017 at 06:26 AM Report Posted November 20, 2017 at 06:26 AM 15 hours ago, Tomsima said: 拖欠 means 'be in arrears' Good examples. Ironically, even though I'm a native speaker of English, I didn't know what arrears was and had to look it up . Maybe its British English... Quote
imron Posted November 22, 2017 at 02:06 AM Report Posted November 22, 2017 at 02:06 AM On 11/20/2017 at 2:26 PM, mtokudome said: Maybe its British English Nope. It is however a legal term and might not be something you'd regularly come across unless you didn't pay your debts. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.