xaze Posted September 14, 2010 at 06:46 AM Report Posted September 14, 2010 at 06:46 AM I have two questions: 1) Where can I download the original source material that was used to make this list? 2) How are HSK words picked? Is it by frequency? Would learning HSK words or learning words from lets say this frequency list be more beneficial if someone does not plan to take the HSK and wants to learn as efficiently as possible? Quote
James Johnston Posted September 14, 2010 at 07:26 AM Report Posted September 14, 2010 at 07:26 AM Xaze, the list you link to is a character list not a word list, so it would be of very limited use as far as learning Chinese is concerned. A character list, such as the one you link to, could be used for practising writing and character recognition, but you could know thousands of characters and still be unable to speak or understand a sentence of Chinese. Quote
renzhe Posted September 14, 2010 at 01:11 PM Report Posted September 14, 2010 at 01:11 PM 2) How are HSK words picked? Is it by frequency? Would learning HSK words or learning words from lets say this frequency list be more beneficial if someone does not plan to take the HSK and wants to learn as efficiently as possible? Nobody knows the exact procedure, but it involved a statistical analysis over a large corpus of written and spoken (transcibed) Chinese text. They are supposed to represent important and common words and are the closest that we have to an actually useful word frequency list. 1 Quote
aristotle1990 Posted September 14, 2010 at 01:47 PM Author Report Posted September 14, 2010 at 01:47 PM I'd add that when you're done memorizing the list, you'll be able to understand completely, or at least get the basic gist of, a lot of modern material. During reading, it's very common for me to come across words whose meanings I would otherwise not know that I've previously only encountered on the HSK word lists. Quote
BertR Posted September 14, 2010 at 02:03 PM Report Posted September 14, 2010 at 02:03 PM 1) Where can I download the original source material that was used to make this list? http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_1.pdf http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_2.pdf http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_3.pdf http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_4.pdf http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_5.pdf http://www.confuciusinstitute.qut.edu.au/docs/hks_2010_level_6.pdf Quote
xaze Posted September 15, 2010 at 07:55 PM Report Posted September 15, 2010 at 07:55 PM Wow, that is amazing you guys copied the PDF files by hand! Good work and thanks for the effort. Quote
xaze Posted September 15, 2010 at 08:27 PM Report Posted September 15, 2010 at 08:27 PM Which of the files is the most accurate? I wrote a program to check the files and there are inconsistencies between each one. For example, the level 6 file has 反映 for level 4. However, such a word does not exist in the level 4 file. Each of the following are inconsistent between level 6 and the level it appears in 5,番,fan1,to take turns; order in series; time; a kind of; barbarians, 4,反映,fan3 ying4,"to mirror; to reflect; mirror image; reflection; fig. to report; to make known; to render; used erroneously for 反應|反应, response or reaction", 4,副,fu4,secondary; auxiliary; deputy; assistant; vice-; abbr. for 副詞|副词 adverb; classifier for pairs, So, for example, 番 appears in level 6 as a level 5 word, but does not appear in the level 5 file. I am sure there are even more inconsistencies between the various files, so what is the most accurate method to get the best file? Is taking level X (such as 3) from the level X file (such as 3.csv) or is the level 6 file the most accurate (so, if I want level 3 take all level 3 words from the level 6 file)? Btw, I only compared words against the level 6 file. I am sure that level 1-4 words in the level 5 file are different than the 1-4 words in their corresponding individual file. To add to that, there are more inconsistencies. For example: 1) there are 4995 unique character sequences in the level 6 file 2) When looking at only the corresponding level in each file (only look at lvl 2 in the 2nd file, only lvl3 in the 3rd file, etc), there are 4993 unique characters 3) When adding all files together, there are 4996 unique character sequences Quote
ChouDoufu Posted September 16, 2010 at 02:22 AM Report Posted September 16, 2010 at 02:22 AM Thanks. Actually, the input was pretty easy. I did most of it while watching US TV shows on Youku! Unfortunately I added the HSK Level after creating the files, and this added a bunch of mistakes. I'll be going over each list more thoroughly in the near future. I can't make any recommendations (except to say that it's more likely for level 6 to have errors than level 3 because level 6 has more words). I've fixed the issues you mentioned. (except for 反映 which appears as #229 in the HSK Level 4 list). If you mention specific problems that you find, I'll fix them as soon as I can. Could you go into more detail on the inconsistencies with unique character sequences? I don't quite follow what you're talking about. Quote
Peiruo Posted September 16, 2010 at 02:29 AM Report Posted September 16, 2010 at 02:29 AM Oh man, I love you guys. Thanks for your hard work and for taking the time to share and link. Peiruo ;) Quote
xaze Posted September 16, 2010 at 03:35 AM Report Posted September 16, 2010 at 03:35 AM Could you go into more detail on the inconsistencies with unique character sequences? I don't quite follow what you're talking about. Okay, lets say I count the number of unique chinese words in the file (the hsk word of interest). Depending on how the data is picked, the results are different, when it should be the same. 1) there are 4995 unique character sequences in the level 6 file In this method all the unique hsk words are counted from file 6 2) When looking at only the corresponding level in each file (only look at lvl 2 in the 2nd file, only lvl3 in the 3rd file, etc), there are 4993 unique characters In this level all the level 1 words from the level 1 file are added to all the level 2 words from the level 2 file, etc and counted. 3) When adding all files together, there are 4996 unique character sequences In this method all words from level 1 file are added to all words from the level 2 files, etc What this means is that the level 6 file is not actually the level 1-5 file combined with the addition of the level 6 vocabulary. Quote
xaze Posted September 16, 2010 at 06:43 AM Report Posted September 16, 2010 at 06:43 AM Also, where did you get definitions for the following: 踢足球 打篮球 …分之… 使劲儿 摊儿 纽扣儿 招投标 烟花爆竹 I have looked and cannot find their meanings in CCCEDict or NCIKU. Quote
ChouDoufu Posted September 16, 2010 at 07:18 AM Report Posted September 16, 2010 at 07:18 AM …分之… isn't in CC-CEDICT, but 分之 is, so I used that definition. The others are my own definitions. Quote
ChouDoufu Posted September 16, 2010 at 04:40 PM Report Posted September 16, 2010 at 04:40 PM Ah, I see, I'm after an exact list identical, in the same order with numbers.It's ok, I've got a small army of typists (students) helping me build an exact replica for my needs. I've did some modifications to the list. I added the original order numbers to levels 1-3. And fixed a few more errors I found. Hopefully the addition of the order numbers will make the lists easier to check and correct. Quote
rcloud19 Posted September 16, 2010 at 11:41 PM Report Posted September 16, 2010 at 11:41 PM Steven, Would it be possible for you to post a list of the corrections for the errors that you found? I'd like to update the Anki deck. Quote
Sakat Posted October 5, 2010 at 01:51 AM Report Posted October 5, 2010 at 01:51 AM Hello everyone, I know that I'm new so I might write something stupid, if so sorry in advance. At first, thanks to everyone for contribuate to these lists! Then I've a question: Are these lists (lingomi) really completes? Or does it just lists the characters which appeared in these versions of the new HSK (not the complete list we're supposed to know)? Quote
jbradfor Posted October 7, 2010 at 09:57 PM Report Posted October 7, 2010 at 09:57 PM Not sure I follow your question.... are you asking if these lists are just the characters in each list, or the full word? If so, just take a look at the list.... no wait, I can do that too..... yes, they are the words, not the characters. If I misunderstood you, could you rephrase the question? Quote
马盖云 Posted October 8, 2010 at 04:45 AM Report Posted October 8, 2010 at 04:45 AM I think sakat's asymmetrical hair-do made it too hard to understand the question :-) My understanding of his question is: Are these lists known to be the complete list of all possible words that can appear on a HSK test, or are they just representative examples based on the sample tests that have been made public? Like the SAT tests, you can get a feel for it from the sample tests, but it's no guarantee that they won't change it up on the actual tests... I dont think he was drawing a distinction between characters vs. words at all, BTW. Quote
Sakat Posted October 9, 2010 at 06:13 AM Report Posted October 9, 2010 at 06:13 AM Oops, sorry for not being clear enough. As 马盖云 explained, I was wondering if the list of hanzis at the end of the PDF are "the complete list of hanzis which can appeared in the examination" or "the list of hanzis which appeared in this version". Quote
jbradfor Posted October 11, 2010 at 12:22 AM Report Posted October 11, 2010 at 12:22 AM My understanding is that it is neither. It is "the list of words the HSK committee thinks are most important to learn and most likely to appear on the HSK". But words can appear on the test that are not on the lists, and I don't think they claim that a given word every appeared on a test. Quote
attarian Posted November 18, 2010 at 04:49 AM Report Posted November 18, 2010 at 04:49 AM Is there any HSK 4 shared deck ? I cannot find them out.... Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.