Jump to content
Chinese-Forums
  • Sign Up

Frequeny lists in a diagram


Recommended Posts

Posted

Frequency lists show all the words or more commonly characters, ordered according to their frequency in a certain corpus. If you look at the frequencies, though, you will find that, according to Zipf's law, the actual difference between the frequency of a character and the next one in the list becomes really small pretty quickly. In the diagram, this is the yellow line with the y-scale at the right side. The line actually starts above the diagram at 4,4% (for rank=0=我) and then drops below 0.5% at around 30. So in the beginning following the frequency list makes sense, however once you know 500-1000 characters, I would put less emphasis on the frequency list but instead on other things like context, etymology, mnemonics, words etc. depending on your overall approach.

The blue line shows the total "text comprehension" (ignoring everything else like words, grammar etc.), once you learned all the characters up until that frequency rank. For example, if you learn the first 1000 characters you would know roughly 95% of all the characters in the original corpus used for the frequency list. However, in a different corpus, shown by the orange line, you might only understand 87% of the text. In this case, the primary list for the blue and yellow line is the subtlex frequency list, based on movie subtitles, and the secondary orange one is (probably) based on a corpus of written text.

 

diagram.thumb.png.52d655f6b78df7a35298c1ac63ceed53.png

  • Like 2
Posted

This has definitely been my experience too.

 

Following frequency lists very quickly becomes less efficient than learning words from native content because the type of content you like to read will have a dramatic effect on the relative frequencies of words, and you'll get better bang for you buck learning words that you will use right now than words you will use at some indeterminate point in the future.

  • 4 months later...
Posted

Agree too, I think it's ok to use lists up to a point. I found writing the first 1000 characters from dajun list helpful and frequently used in my exposure to the language. Similarly with words list for up to HSK4 it 5. However after that its really dependant on what you do. Most of my additions to word lists come from daily routines such as my weekly shopping on 京东, words taken from wechat conversations, TV shows, menus, pace names and so on.

 

One problem has always been that frequency lists exist for characters but for words it's a bit of a mixed bag. Many people use HSK 5 . I keep every word I ever learnt or had an intention of learning at some, in a spreadsheet. This allows me to import it to anki. I have plenty if cross referencing to original source whether it be books, HSK etc. 

 

Interestedly HSK 6 is only 60% of all words, but my list is around 8000 words 

This is because my studies took a turn around the HSK 5 mark. 

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...