Jump to content
Chinese-Forums
  • Sign Up

New HSK Vocab Lists


Recommended Posts

Posted

I have two questions:

1) Where can I download the original source material that was used to make this list?

2) How are HSK words picked? Is it by frequency? Would learning HSK words or learning words from lets say this frequency list be more beneficial if someone does not plan to take the HSK and wants to learn as efficiently as possible?

Posted

Xaze, the list you link to is a character list not a word list, so it would be of very limited use as far as learning Chinese is concerned. A character list, such as the one you link to, could be used for practising writing and character recognition, but you could know thousands of characters and still be unable to speak or understand a sentence of Chinese.

Posted
2) How are HSK words picked? Is it by frequency? Would learning HSK words or learning words from lets say this frequency list be more beneficial if someone does not plan to take the HSK and wants to learn as efficiently as possible?

Nobody knows the exact procedure, but it involved a statistical analysis over a large corpus of written and spoken (transcibed) Chinese text. They are supposed to represent important and common words and are the closest that we have to an actually useful word frequency list.

  • Like 1
Posted

I'd add that when you're done memorizing the list, you'll be able to understand completely, or at least get the basic gist of, a lot of modern material. During reading, it's very common for me to come across words whose meanings I would otherwise not know that I've previously only encountered on the HSK word lists.

Posted

Which of the files is the most accurate?

I wrote a program to check the files and there are inconsistencies between each one. For example, the level 6 file has 反映 for level 4. However, such a word does not exist in the level 4 file.

Each of the following are inconsistent between level 6 and the level it appears in

5,番,fan1,to take turns; order in series; time; a kind of; barbarians,

4,反映,fan3 ying4,"to mirror; to reflect; mirror image; reflection; fig. to report; to make known; to render; used erroneously for 反應|反应, response or reaction",

4,副,fu4,secondary; auxiliary; deputy; assistant; vice-; abbr. for 副詞|副词 adverb; classifier for pairs,

So, for example, 番 appears in level 6 as a level 5 word, but does not appear in the level 5 file. I am sure there are even more inconsistencies between the various files, so what is the most accurate method to get the best file? Is taking level X (such as 3) from the level X file (such as 3.csv) or is the level 6 file the most accurate (so, if I want level 3 take all level 3 words from the level 6 file)?

Btw, I only compared words against the level 6 file. I am sure that level 1-4 words in the level 5 file are different than the 1-4 words in their corresponding individual file.

To add to that, there are more inconsistencies. For example:

1) there are 4995 unique character sequences in the level 6 file

2) When looking at only the corresponding level in each file (only look at lvl 2 in the 2nd file, only lvl3 in the 3rd file, etc), there are 4993 unique characters

3) When adding all files together, there are 4996 unique character sequences

Posted

Thanks. Actually, the input was pretty easy. I did most of it while watching US TV shows on Youku!

Unfortunately I added the HSK Level after creating the files, and this added a bunch of mistakes. I'll be going over each list more thoroughly in the near future. I can't make any recommendations (except to say that it's more likely for level 6 to have errors than level 3 because level 6 has more words).

I've fixed the issues you mentioned. (except for 反映 which appears as #229 in the HSK Level 4 list). If you mention specific problems that you find, I'll fix them as soon as I can.

Could you go into more detail on the inconsistencies with unique character sequences? I don't quite follow what you're talking about.

Posted

Oh man, I love you guys.

Thanks for your hard work and for taking the time to share and link.

Peiruo ;)

Posted
Could you go into more detail on the inconsistencies with unique character sequences? I don't quite follow what you're talking about.

Okay, lets say I count the number of unique chinese words in the file (the hsk word of interest). Depending on how the data is picked, the results are different, when it should be the same.

1) there are 4995 unique character sequences in the level 6 file

In this method all the unique hsk words are counted from file 6

2) When looking at only the corresponding level in each file (only look at lvl 2 in the 2nd file, only lvl3 in the 3rd file, etc), there are 4993 unique characters

In this level all the level 1 words from the level 1 file are added to all the level 2 words from the level 2 file, etc and counted.

3) When adding all files together, there are 4996 unique character sequences

In this method all words from level 1 file are added to all words from the level 2 files, etc

What this means is that the level 6 file is not actually the level 1-5 file combined with the addition of the level 6 vocabulary.

Posted

Also, where did you get definitions for the following:

踢足球

打篮球

…分之…

使劲儿

摊儿

纽扣儿

招投标

烟花爆竹

I have looked and cannot find their meanings in CCCEDict or NCIKU.

Posted

…分之… isn't in CC-CEDICT, but 分之 is, so I used that definition.

The others are my own definitions.

Posted
Ah, I see, I'm after an exact list identical, in the same order with numbers.

It's ok, I've got a small army of typists (students) helping me build an exact replica for my needs.

I've did some modifications to the list. I added the original order numbers to levels 1-3. And fixed a few more errors I found. Hopefully the addition of the order numbers will make the lists easier to check and correct.

Posted

Steven,

Would it be possible for you to post a list of the corrections for the errors that you found? I'd like to update the Anki deck.

  • 3 weeks later...
Posted

Hello everyone,

I know that I'm new so I might write something stupid, if so sorry in advance. At first, thanks to everyone for contribuate to these lists! Then I've a question:

Are these lists (lingomi) really completes? Or does it just lists the characters which appeared in these versions of the new HSK (not the complete list we're supposed to know)?

Posted

Not sure I follow your question.... are you asking if these lists are just the characters in each list, or the full word?

If so, just take a look at the list.... no wait, I can do that too..... yes, they are the words, not the characters.

If I misunderstood you, could you rephrase the question?

Posted

I think sakat's asymmetrical hair-do made it too hard to understand the question :-) My understanding of his question is:

Are these lists known to be the complete list of all possible words that can appear on a HSK test, or are they just representative examples based on the sample tests that have been made public? Like the SAT tests, you can get a feel for it from the sample tests, but it's no guarantee that they won't change it up on the actual tests...

I dont think he was drawing a distinction between characters vs. words at all, BTW.

Posted

Oops, sorry for not being clear enough.

As 马盖云 explained, I was wondering if the list of hanzis at the end of the PDF are "the complete list of hanzis which can appeared in the examination" or "the list of hanzis which appeared in this version".

Posted

My understanding is that it is neither. It is "the list of words the HSK committee thinks are most important to learn and most likely to appear on the HSK". But words can appear on the test that are not on the lists, and I don't think they claim that a given word every appeared on a test.

  • 1 month later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...