Jump to content
Chinese-Forums
  • Sign Up

Overlap between old and new HSK character lists (and UK A-level)


Recommended Posts

Posted

Nice, not sad at all. So what are the 119 characters on the A-level list but not on either the old or new HSK list? I am surprised that there are that many, considering that the old HSK list already has 2800 or so characters whereas the A-level list has only 1900.

By the way, the old HSK vocab list had 8840 words. See here.

http://en.wiktionary.org/wiki/Appendix:HSK_list_of_Mandarin_words

The total number of words on the HSK committee list is 8840.

Posted

That's very interesting!

It seems very good to me that the largest bucket is the 3-way intersection; it would be rather disconcerting if the three lists has significantly different characters. It also makes sense that the second biggest bucket is the 2-way intersection of the old and the new HSK lists, since those have more words than does the UK-A.

It does seem a bit strange that the two 2-way intersections between the HSK lists and the UK-A list is so small, compared to number of characters that are in only one list. Could you drill down a bit deeper and see why? Are they somewhat obscure characters that are part of a phrase or something?

Posted

Wow, this is impressive. How did you compile this? Please tell me it was a script and not done manually.

Posted

Many thanks for your kind comments.

East Asia Student - I generated the vocabulary lists using Clavis Sinica with a lot of spreadsheet manipulation and then logical mathematics to get the split between the various categories.

Gato & jbradfor - unfortunately the method I used did not generate lists of characters in the various categories, sorry.

Posted

Looking at the UK A-Level character list again (taken from the authorised course text books), it contains a large number of proper names, for example 孟子 (the philosopher Menicus), which do not appear in the old or new HSK character lists. This is probably where the bulk of the 119 characters that only appear in the UK A-level list come from.

Interestingly, the character 孟 is the 1740th most common character (according to wenlin), so the compilers of the old and new HSK word lists must have consciously chosen not to include most of these common proper-name characters.

Posted

For completeness, the venn diagram below gives the same information for the new HSK level 5 and old HSK intermediate levels...

post-14806-039366500 1294764560_thumb.jpg

Of the 1711 characters in the new HSK level 5, 1669 characters were in the old HSK intermediate, but 42 are entirely new. 525 characters from the old HSK intermediate level are not used in the new HSK level 5.

Posted

I don't know what that UK A-level list you all are talking about nor where people get it, but for the other two word counts are like this:

new - old = 522 (only in new)

old - new = 4164 (only in old)

new & old = 4473 (in both)

I probably used somewhat different versions of the lists then elliott50 as counts for characters are different. Old list was taken from http://hskflashcards.com/ with some manual fixes and the new one from http://lingomi.com/blog/hsk-lists-2010/.

Counts for characters:

new - old = 74

old - new = 301

new & old = 2560

Here are the actual differences:

Only in new

丐 伦 侠 剔 吝 咀 哺 唉 啬 啰 嗨 嘈 嚏 墅 婪 宠 尬 尴 岳 峭 庇 徙 恍 恕 惮 愣 抒 擎 斟 昔 暧 曝 杖 椎 橙 欧 沐 沧 沮 浏 涮 涵 澈 濒 熨 瘾 硕 磅 篷 紊 纬 纽 绎 肴 胎 腻 舔 苟 荧 虐 裔 觅 讳 诺 谍 账 赁 迄 迸 遏 锲 飙 馅 魅

Only in old:

犁 邪 撇 笆 罗 揽 伊 侍 砌 咏 芬 泣 跺 梗 倘 钮 涝 碟 贞 槽 柠 瞥 梧 瑰 箩 刨 岭 杨 芳 玲 讹 涤 瑞 喽 稼 窿 叁 脊 蹄 拇 浆 秉 寇 腊 炊 晌 盏 珑 呐 燕 乔 哗 毙 橘 蛛 缎 呜 楞 寡 屠 瓣 榆 潦 肠 抡 苯 絮 篱 鹰 茧 韵 驴 磷 酶 灸 沏 秽 驼 竿 蕾 刃 掂 茅 袄 雀 凳 捶 剃 垒 栗 玖 桅 君 驮 蒜 瘟 俄 锡 棱 垦 巫 肝 秆 檬 疮 贱 莲 窟 冶 锹 鲸 掺 尿 禾 镁 铀 乃 菊 捅 绷 匠 蝉 珊 镑 狐 柒 阀 穗 汛 筝 糠 艾 浩 函 硫 揪 潭 屯 曰 勒 卵 拴 顷 狸 闺 硅 鹿 翠 蚁 阁 砂 梅 茄 逆 颊 锌 窑 炕 囱 钙 蜘 倚 芝 垮 龟 侄 骡 粱 枣 僚 沥 唤 斧 鸦 亩 玫 爪 芭 钳 薯 氮 碱 贰 痰 撵 榷 芹 缸 蚕 羿 雁 抠 桂 啄 菇 穆 鹊 捌 柏 屎 荔 蛙 焊 瑚 铝 卜 汞 噢 菠 秧 徽 淫 汪 蜓 凯 尼 拱 兰 柳 丹 绵 靴 婶 壹 奸 藤 妖 歼 凿 虾 刁 翁 蚂 谗 骆 冈 笋 蚊 萍 蘑 徐 弓 熔 蔗 笛 镰 萝 挟 锣 亢 淇 纱 邦 粪 锯 暮 樱 舵 榴 爹 蜻 猿 肾 豁 坊 鹅 柄 蝇 仆 囊 杏 豌 剑 槐 黒 俏 蝗 棚 姜 绞 埠 绢 凤 轧 桐 桩 寨 蹭 坯 葱 凰 铲 葵 痴 荷 闸 捻 棺 籽 轿 蛾

  • Like 1
Posted

Frequency indexes for the top 10 characters from the "Only in new" list:

873 欧

1130 伦

1190 诺

1605 纽

1788 硕

1902 唉

1910 胎

1982 账

2026 岳

2035 侠

Same for the "Only in old" list:

564 罗

670 兰

841 尼

860 杨

1004 伊

1033 俄

1095 梅

1179 君

1216 徐

1275 丹

Character frequency data from

http://corpus.leeds.ac.uk/frqc/i-zh-char.num.html

  • Like 1
Posted

Excellent work cababunga, thanks!

Many of the most common characters that are "only in the old" list (e.g. 罗, 兰, 尼 & 伊 ) are often used to transliterate foreign words, so I suspect that the new HSK has many fewer foreign proper names in it.

Other changes perhaps reflect altered geo-political realities, for example 欧洲 (Europe) is only in the new HSK, while 俄语 (the russian language) is only in the old HSK. Or changing technology, 纽扣儿 (button, e.g. on a computer) has come in, but 犁 (to plough a field) has gone out. Or altered educational aspirations, 硕士 (master's degree) has come in, but 少先队 (young pioneer) has gone out.

I'm sure that a full analysis of the differences would yield a fascinating picture of the changes in China's self-image between the publication of the two lists. Surely a PhD thesis in the making for someone...

Posted

I don't know about the A-level exam, but considering quite a lot of the vocabulary and characters in the HSK exams come from outside the lists, I think the list are to be taken with a pinch of salt anyway. What is and what's not included seems rather arbitrary.

Posted (edited)

I did something similar for words:

Old HSK 1 => New HSK 1 : 132

Old HSK 2 => New HSK 1 : 8

Old HSK 3 => New HSK 1 : 4

Old HSK 4 => New HSK 1 : 0

Old HSK 1 => New HSK 2 : 123

Old HSK 2 => New HSK 2 : 20

Old HSK 3 => New HSK 2 : 9

Old HSK 4 => New HSK 2 : 2

Old HSK 1 => New HSK 3 : 204

Old HSK 2 => New HSK 3 : 75

Old HSK 3 => New HSK 3 : 17

Old HSK 4 => New HSK 3 : 6

Old HSK 1 => New HSK 4 : 185

Old HSK 2 => New HSK 4 : 318

Old HSK 3 => New HSK 4 : 49

Old HSK 4 => New HSK 4 : 20

Old HSK 1 => New HSK 5 : 96

Old HSK 2 => New HSK 5 : 684

Old HSK 3 => New HSK 5 : 323

Old HSK 4 => New HSK 5 : 122

Old HSK 1 => New HSK 6 : 13

Old HSK 2 => New HSK 6 : 177

Old HSK 3 => New HSK 6 : 694

Old HSK 4 => New HSK 6 : 1289

Words in the New HSK that weren't in the old one.

... => New HSK 1 : 11

... => New HSK 2 : 8

... => New HSK 3 : 14

... => New HSK 4 : 30

... => New HSK 5 : 111

... => New HSK 6 : 347

Words in the Old HSK that aren't in the new one.

Old HSK 1 => ... : 250

Old HSK 2 => ... : 704

Old HSK 3 => ... : 1086

Old HSK 4 => ... : 2128

Edited by BertR
  • Like 1
Posted

Many thanks BertR, your research certainly confirms that there is no direct mapping between the old and new HSK levels. Which means that studying the old HSK early level material in order to prepare for the new HSK lower level tests may not be as helpful as one might hope.

Posted
Old HSK 1 => New HSK 6 : 13

Old HSK 2 => New HSK 6 : 177

Old HSK 3 => New HSK 6 : 694

Old HSK 4 => New HSK 6 : 1289

Words in the Old HSK that aren't in the new one.

Old HSK 4 => ... : 2128

BertR, can you explain what these numbers mean? Why is "Old HSK 4 => New HSK 6" 1289, but the number of words in Old HSK 4 but not in new HSK is 2128?

Posted

New HSK 6 are the words in the word list of New HSK 6 except those that are already in New HSK 5. So I count those that are extra for that level.

I did the same for the Old HSK.

Old HSK 1 is the Basic level (基础HSK, 1级-3级)

Old HSK 2 is the Elementary level (3级-5级)

Old HSK 3 is the Intermediate level (6级-8级)

Old HSK 4 is the Advanced level (高等HSK, 9级-11级)

Old HSK 4 => New HSK 6 : 1289

Means: 1289 words of the Old HSK 4 level (these are the new words for the advanced level, not in Old HSK 3) are also in the new HSK 6 level (but not in new HSK 5).

Old HSK 4 => ... : 2128

Means: 2128 words of the Old HSK 4 level (these are the new words for the advanced level, not in Old HSK 3) are not in the new HSK.

With "=>" I meant " moved to ". Mathematically writing Count(Intersection(new words for Old HSK 4, new words for New HSK 6)) = 1289 would be a correct way to write Old HSK 4 => New HSK 6.

For Old HSK 4 => ... : 2128 this would become Count(new words for Old HSK 4 \ all words for New HSK) = 2128 with \ the difference operator.

Is this more clear?

  • Like 1
Posted

I see. Thanks for the clarification. I hadn't seen "=>" used as intersection before. ;)

I think the latter sets of statistics comparing to the cumulative list are the most useful since there is no claim that individual levels of the old vocab list would map to any particular level of the new vocab list. On second thought, your intersection data might be helpful if you can show them in a bar diagram form like this. Maybe you can rig one up. :P

Words in the New HSK that weren't in the old one.

... => New HSK 1 : 11

... => New HSK 2 : 8

... => New HSK 3 : 14

... => New HSK 4 : 30

... => New HSK 5 : 111

... => New HSK 6 : 347

Words in the Old HSK that aren't in the new one.

Old HSK 1 => ... : 250

Old HSK 2 => ... : 704

Old HSK 3 => ... : 1086

Old HSK 4 => ... : 2128

Posted

So the intended meaning for "=>" was "moved to"

You want lists such as these?

... => New HSK 1 : 11

Word       dwhyyjzx rank      internetzh rank   lcmc rank      New HSK     Old HSK 
北京        253               189               319            1
不客气                        10340                            1
出租车      32982             1968                             1
打电话                        1091              4464           1
火车站      4333              3895              3014           1
哪儿        556               1490              2226           1
那儿        787               1420              3532           1
说话        1039              540               763            1
下雨        11414             3746                             1
这儿        357               1114              1575           1
中国        40                81                82             1

That might take some time for all lists. I have all building blocks ready, but currently it takes some manual work to generate these lists...

When I have more time, I'll make a website that allows to retrieve these kind of statistics (the web pages actually already exits, but it's not publicly reachable. Also it still needs some work so that others can actually use it).

Posted

Hmm, I actually don't need any more statistics....

A super list combining the new and old HSK list might actually be useful. I'm not sure about the levels, though.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...