Jump to content
Chinese-Forums
  • Sign Up

research on chinese vocabulary size?


Recommended Posts

Posted

i recently read I.Nation research on the vocabulary size needed for a learner of English in order to get to a 98% comprehension of native text (books, radio, newspaper) and i was wondering if anybody know of a research that have been done in the same manner but on the Chinese language? that is to say, what is the vocabulary size needed for a Chinese learner in order to get to a 98% comprehension of a Chinese native text. i know all the complexity of the subject and also that before there some debates here on the subject, but i looking for an empirical academic research (written in Chinese or English) that been done on the subject. will be glad if someone who know this kind of paper will let me know.

thanks!

Posted

Hello yedafu,

 

Here is a research paper that talks about a recommended amount of vocabulary words that an L2 learner of Chinese should know to have a basic understanding of any given news article.

 

The most interesting part is the result section, which shows the frequency distribution of characters throughout the author's samples. Surprisingly enough, about 98% of the texts are comprised of just a set of 2000 characters (this shows to be true in other research papers and popular speculations). The author argues that for basic comprehension an L2 learner of Chinese would have to know at least 20,000 frequently occurring words.

 

We know that a large portion of Chinese words are disyllabic, and having to learn 20,000 words seems like a daunting task. However, the neat thing about Chinese is that if we understand the individual characters, we can often (or just sometimes) deduce the meaning (or make a very accurate guess) of a disyllabic Chinese word. This takes the load off of how much we need to actively study, because the meaning of a good portion of the new words we come across can be deduced through context and our knowledge of the word's individual characters.

 

There are also some good references you could read listed at the bottom of the author's paper, such as Da Jun's paper.

 

http://www.fb06.uni-mainz.de/chinesisch/Dateien/hanzirenzhi_papers_da.pdf

  • Like 1
Posted
2 hours ago, WenLei-William said:

Here is a research paper that talks about a recommended amount of vocabulary words that an L2 learner of Chinese should know to have a basic understanding of any given news article.

 

The link isn't working for me...

Posted

Thanks so much! it exactly what i was looking for! 

the link is working i just downloaded the paper. 

thanks again

Posted

No problem yedafu,

 

For anyone who can't access the link, the paper is titled: "Reading news for Information: How much vocabulary a CFL learner should know" by Jun Da (or Da Jun)

  • 1 month later...
Posted

Interesting to look at the statistics. Amazed to see 洗腦 and 南轅北轍 in the list of least frequently occurring words, as I encounter the former pretty much every other day, and 南轅北轍 is used by CCTV news reporters on a fairly regular (seems like they have a set list of chengyu that get repeated on the evening news, perhaps for stylistic or rhetorical purposes). That being said, the dataset is pulled from books, not from speech, so it was always going to be skewed in some way.

 

As is often attested to on the forums, even if 1572 characters covers 99% of commonly used characters, it is not going to get you close to 99% comprehension. I think the same is true for the estimate of 12,054 words. I would guess I know more than this amount of words, but I am far from being able to say I am capable of '99% comprehension'.

 

I am a native English speaker and I can only know 99% of English words. Picking up an english book in an unfamiliar topic area will further reduce my comprehension by a few percent. In light of this, I think it's safe to say statistics along the lines of 'if you learn x you will reach y% comprehension while fun, are ultimately pretty meaningless. There are just way too many variables.

Posted

@Tomsima, I also thought it was interesting that 洗脑 was one of the least common words.  I agree with you that it would be interesting to see stats based on a spoken dataset (maybe from TV shows and news).

 

With a vocabulary around 12,000, I'm curious to know how often you come across new words you don't know.  Or maybe they are words you studied before but you just need more exposure to them.  Do you still work on increasing your vocabulary?

 

Posted
16 hours ago, mtokudome said:

With a vocabulary around 12,000, I'm curious to know how often you come across new words you don't know.  Or maybe they are words you studied before but you just need more exposure to them.  Do you still work on increasing your vocabulary?

 

working on increasing vocabulary and coming across words you don't know are linked. I come across words I don't know every day, but perhaps they cant properly be called 'new words' because of the way compound words and context helps out. if I read a newspaper and I don't 'work' on my vocabulary, I can still understand pretty much everything, even if a character I've never seen before comes up in a compound with a character I am familiar with, plus the supporting context. But if I say to myself, what exactly does this mean? What are the deeper connotations? Is this rhetorical? Sarcastic? Playful? what is a good translation if I work back to English? can I use it? If I pay attention to these things and cant answer them, it goes in the list for studying.

 

A good example is my 'todays words i don't know':

拖欠

應驗

繩之以法

光陰

 

the first three are completely guessable by context on the fly. but at the time of reading I couldn't immediately and confidently say that 拖欠 means 'be in arrears', so it went into my 'new' words list. the fourth word is totally new for me, as it is a new literary usage of 光 and 陰 both of which represent time in this combination. which I was pretty much oblivious to. Take that as you will - is this a list of four new words or just three words I kind of get, and one 'real' new word?

 

 

Long story short. new words in every conversation. new words in every reading session. and that's why learning Chinese is fun, it never gets boring!

 

  • Like 1
Posted

I'd say 洗脑 is quite common in netspeak. For example we are periodically being attacked by 洗腦神曲s like Numa Numa or Gangnam Style. :mrgreen:

光陰 is also a common word: 話說光陰似箭、日月如梭,XXX 學習漢語已有 N 個年頭。這一日 TA 來到…… (If you read Wuxia novels, you're guaranteed to meet sentences like this at least once.)

There's an old saying 一寸光陰一寸金,寸金難買寸光陰, about the same meaning as 尺璧非寶,寸陰是競 from the Qian Zi Wen that you're reading.

  • Helpful 1
Posted
15 hours ago, Tomsima said:

拖欠 means 'be in arrears'

 

Good examples.  Ironically, even though I'm a native speaker of English, I didn't know what arrears was and had to look it up :D.  Maybe its British English...

Posted
On 11/20/2017 at 2:26 PM, mtokudome said:

Maybe its British English

Nope.  It is however a legal term and might not be something you'd regularly come across unless you didn't pay your debts.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...