New Members Luosi Posted September 17, 2024 at 09:20 AM New Members Report Posted September 17, 2024 at 09:20 AM I really don't agree with the idea that HSK6 gets you halfway, so I wrote an article about it. https://seven-learning.com/hsk6-gets-you-halfway-nonsense/ If anyone doesn't know what I mean by this, see the link in my article. Here's the TL;DR. Why HSK doesn't get you halfway: there are several elementary maths errors involved in the calculation (97.7% doesn't round down; 2600 is not half of 4400) the author is comparing apples with oranges, 99.8% coverage isn't necessary for reading comprehension, Hanzi alone isn't a reliable measure of reading ability, let alone speaking and listening, past 99.5% coverage, characters are so infrequent that they'll appear on average once every 347 pages of a standard book. Why bother counting them? Quote
abcdefg Posted September 17, 2024 at 09:18 PM Report Posted September 17, 2024 at 09:18 PM On 9/17/2024 at 4:20 AM, Luosi said: I really don't agree with the idea that HSK6 gets you halfway, so I wrote an article about it. How strange. Why would anyone even wonder about that? 1 Quote
imron Posted September 18, 2024 at 01:57 AM Report Posted September 18, 2024 at 01:57 AM I did! And I wrote an article on it several years back. This is refuting that article. Will take a more detailed look later, but to clear a few things up: There obviously isn’t an exact halfway point of learning Chinese. The figures are only ballpark ranges with variations at the edges. The point of my original article isn’t about making exact numbers for what you need to know, rather it was to point out that getting to HSK6 isn’t the endpoint of learning Chinese and you still have a significant amount of work to do if you want to be able to comfortably read native Chinese content without any reading aids or dictionaries. 1 Quote
TheWayfarer Posted September 18, 2024 at 03:17 AM Report Posted September 18, 2024 at 03:17 AM EDIT: I accidentally hid the contents of my reply, so I'm reposting. Quote There obviously isn’t an exact halfway point of learning Chinese. I absolutely agree! "How long's a piece of string? I don't know, but I only need half of it anyway..." The words in HSK1 tend to be very high-frequency and useful, and shouldn't be weighed equally with those of, say, HSK9. Knowing and being able to use 是, 他, 被, 知道, 东西 is more useful than knowing 咧嘴, 飙升, 涵盖. Moreover, the HSK 3.0 total vocabulary contains over 11,000 items, but these consist of "only" 3,000 characters. Getting a sense of what the characters mean while learning vocabulary at the lower levels makes it much easier to learn the words at the higher levels. I'm not saying the advanced stuff isn't worth knowing, on the contrary; but I am saying that building on a solid foundation is the best way to make real progress at the advanced level. When I first started learning Chinese, I tried to move ahead before I had a good foundation, which ended up setting me back, in terms of knowledge, time, and motivation. It's better to have 2,000 words you can use comfortably and with great variation, than 10,000 you "sort-of-kind-of-maybe" recognize. Now having learnt from my past mistakes, even as I try to 更上一层楼, if I come across anything that I'm shaky on, no matter how "elementary" it might be, I look it up, try it out, practise using it, etc. Charts, graphs, statistical analysis, and the like can be useful for surveying the field one has to conquer as a learner, but one cannot learn a language with mathematical precision. One may study that way, but the mind absorbs and remembers in its own way, whatever our best intentions. Learning vocabulary according to the HSK levels breaks a big task down into smaller, measurable, achievable ones. It also provides a sense of success and satisfaction as one progresses from one level to another. I think both the articles cited above have merit. The article by @imron sobers the inexperienced who think they become fluent in a month or two, while the article by @Luosi heartens the knocked-about novices (and veterans, even) who start thinking they won't become fluent even after many, many years. Quote
becky82 Posted September 18, 2024 at 04:31 AM Report Posted September 18, 2024 at 04:31 AM Quote (HSK6 Gets You Halfway:) Imagine that you’ve been studying hard for a couple of years and have finally passed HSK 6. Going by the HSK wordlists, you’d know about 2,600 characters and around 5,000 words. This is the biggest problem with the original article: simply "knowing" 2600 characters and 5000 words is probably not enough to pass the HSK6. (Maybe it used to be the case years ago.) If the "HSK 6 gets you halfway" article is merely claiming that "2,600 characters and around 5,000 words" is not enough for ease of reading, then that seems fairly reasonable to me. Your reading will definitely benefit from knowing more characters and words beyond that. (Also, I don't think you're meant to interpret "halfway" mathematically here; it's a figure of speech.) There are many 超纲词 = extra-curricular words on the HSK6 exam, and judging from this it seems like there are more 超纲词 than HSK6 words (that don't belong to the HSK1-5). You're also expected to know the HSK words and characters to considerable depth (the exam might test you on rare and figurative usages), to be able to read at 160+ characters per minute, and be familiar with Chinese geography, history, and culture. By the time a student gets to the HSK6, they've probably already read multiple Chinese novels (or an equivalent amount of reading material). I expect 《活着》 would be considered a bit easy for a student who has passed the HSK6. I started reading adult, non-translated Chinese novels at around late-HSK5. Quote (HSK6 Gets You Halfway:)《哈利波特与魔法石》 Words like 霍格沃茨 = Hogworts, 邓布利多 = Dumbledore, 格兰芬多 = Gryffindor all contain non-HSK characters, and would inflate the statistics. (Also, there are also non-HSK characters like 刘 which students definitely know.) Maybe we need a qualitative analysis, not just a quantitative analysis: precisely which characters in these novels are a problem, and how do they affect the student? Quote (The “HSK6 Gets You Halfway” Nonsense:) repeat the author's analysis for several character frequency lists Did you do this analysis? The choice of character frequency list really isn't the key problem in the experimental design in my opinion. It'd be best to see where actual human students are actually struggling. Quote (The “HSK6 Gets You Halfway” Nonsense:) past 99.5% coverage, characters are so infrequent that they'll appear on average once every 347 pages of a standard book. Why bother counting them? Admittedly, "counting" is not especially relevant (is anyone claiming it is?). But in any case, the Jun Da characters at this level begin 驭, 惘, 吠, 驮, 瑙, 炬, 痉, 曝, 恺, 胺. These characters could appear on the HSK6 exam, such as in 多巴胺 = "dopamine" and 火炬 = "torch" (especially when talking about the Olympics). In fact, 曝光 is a HSK6 word. Quote (The “HSK6 Gets You Halfway” Nonsense:) Knowing such infrequent characters isn't going to make a noticeable difference to your level. Advanced students have already studied the non-rare characters to exhaustion or near-exhaustion; there's only rare characters left to study. As you improve, your reading speed increases. This means you get a lot more input, so characters that once felt rare no longer feel so. It also means when you read, you zip through the easy content and get bogged down on the hard content which contains these rare characters. These rare characters become the bottleneck to improving your reading fluency. I'd also point out that many students have personal interests, e.g., I like science. In order to read about science in Chinese, I need to know quite a lot of rare characters. Quote (The “HSK6 Gets You Halfway” Nonsense:) past 99.5% coverage, characters are so infrequent that they'll appear on average once every 347 pages of a standard book. Hypothetically, suppose we study 347 characters, each of which occur once every 347 pages on average. Then we get an extra character per page on average, which is a worthwhile improvement. Once you reach a high enough level, you turn your focus towards breadth, including learning large numbers of characters. Besides, learning characters at this level is usually far easier, since they mostly only have one meaning, and you're quite accustomed to learning characters, and you've likely encountered them many times prior in your reading. For example, if you want to learn 表 you'll need to learn its 5+ meanings in a bunch of different words. But if you want to learn the rare character 桉 = eucalyptus (Jun Da: 5278), you just add it to your mental list of other tree characters you've learned 杨, 橡, 松, etc., and remember it's written 木 + 安, and pronounced the same as 安 ān. And that's basically all you have to know about this character. And I'd guess that people who have already learned lots of characters, actually enjoy learning lots of characters. Quote
Guest realmayo Posted September 18, 2024 at 01:28 PM Report Posted September 18, 2024 at 01:28 PM Seems to me the main point of Imron's piece was that you'll be uncomfortable reading Chinese novels if you only know 2600 characters, and much more comfortable if you know 4400. Sounds like sense, not nonsense, to me. I do disagree with Imron's assertion that 2000 equals 2600 but it wouldn't be the first time we've parted company over statistics. However he's not wrong to point out to anyone hoping to go from passing HSK6 to happily reading novels that, well, constantly bumping across unknown characters is an exercise in masochism (that's why god invented Pleco Reader). Quote
Popular Post imron Posted September 19, 2024 at 12:49 AM Popular Post Report Posted September 19, 2024 at 12:49 AM On 9/18/2024 at 1:17 PM, TheWayfarer said: The words in HSK1 tend to be very high-frequency and useful, and shouldn't be weighed equally with those of, say, HSK9 When this article was written, there was no HSK 9. The highest level was HSK 6 and it was roundly criticised as not being a sufficient indicator of advanced Chinese proficiency. At the time, there were also many posts on the internet, both here and other places where people who had passed old HSK 6 were noting that they still found it difficult to read native content (even taking in to account the 超纲词 they knew). There were also people solely studying old HSK 6 word lists with the aim of being able to consume native content and likewise finding it frustrating to still not be able to consume native content comfortably (though there are also other factors why this is the case for people who have only been cramming word lists). My original article was written within that context and aimed to quantify why people who had reached old HSK 6 were still having difficulty with native content and quantify how much effort remained in order to be able to read native content with some degree of comfort. The conclusion being that however much effort it took you to reach old HSK 6 was about the same amount of effort you’d need put in again in order to be able to read native content comfortably without aid from other tools or dictionaries. I feel this conclusion stands up well, especially if you consider that the new HSK 9 doubles the amount of words required compared to the old HSK 6. I know, I know, words vs characters, apples vs oranges not the same thing etc etc. People focusing on the character count statistics from the article seem to be missing the part at the beginning that says characters are NOT the most important factor and you need to be looking at words instead. So why does the article focus on characters then? because character statistics can be used to establish a minimum baseline of unknown words - because by definition if you don’t know one of the characters in a word then you don’t fully know that word. And yes you might still be able to understand those words in context but that’s not the same thing as fully knowing the word. So, if the number of unknown characters in a piece of text is X, then the number of unknown words will be at least X, plus some other number Y (made up of words that have characters in X and words where you know all the characters but still don’t know the word). We can then argue about levels of known words required for reading comprehension and reasonable people can reasonably disagree on this but roughly speaking: With the aid of tools you can work your way through and reasonably comprehend a text at 90-95% known words. At 98% known words you’ll be able to passively understand most unknown words without looking them up - and you can probably read at this level without too much issue. Chinese the Hard Way isn’t aiming for mere reading comprehension however. The site and the articles on it are aiming for a much higher standard, specifically: Quote to be able to do most of the things you can do in your native language without your Chinese language skills (or lack of them) getting in the way Another article on the site defines this for reading for as being able to Quote Read and enjoy general Chinese language materials such as websites, newspapers and novels without needing to resort to a dictionary except occasionally (and when [you] do, a Chinese-Chinese dictionary is enough). Everyone is free to define their own standard that they are happy with. The above is mine and the other articles on the site are written with that standard in mind. Note: this standard explicitly doesn't mention anything about knowing a certain number of characters or words. I couldn't tell you how many characters or words I know because I don't see this as a useful metric to track. Anyone who has read much of what I've had to say over the years will know that I'm a strong proponent of doing the things you want to learn rather than just trying to accumulate various metics. The article is heavy on these statistics not because you should focus on reaching these numbers, rather it's because it's trying to see if there data that can explain the observation that people reaching old HSK 6 and not being able to comfortably read native content. Although there's no mathematically exact way to determine this, I think the statistics provided are still ballpark level accurate e.g. Luosi's article makes the point that: Quote that's anywhere between 10 and 24 new characters per page, not the 20 the author suggests. But, 10 and 24 new characters per page is still highly disruptive to the reading process and that's what gives people the feeling that they can't read comfortably even after reaching X level. There's simply too much variability in the amount of words different people know and too much variability in the content they read to provide precise mathematical breakdowns - but I think the statement I made in the original article is still directionally accurate. If I think about reading in my native language, I rarely if ever need to look up words in the dictionary and rarely encounter new words. That’s what I’m aiming for in Chinese also. Based on the minimum baseline concept mentioned above, if there one unknown character per page, then there will be at least one unknown word per page. This is a lower standard than reading in my native language, but well within the definition provided above. It's an acceptable level for me though when reading. On 9/18/2024 at 2:31 PM, becky82 said: Words like 霍格沃茨 = Hogworts, 邓布利多 = Dumbledore, 格兰芬多 = Gryffindor all contain non-HSK characters, and would inflate the statistics. Ignoring statistics, this is more or less agreeing with my main point - that you won't be able to pick up a general text and read it comfortably at old HSK levels. Though if you start reading it, you'll quickly learn these world specific words and your reading will be better off for it. On 9/18/2024 at 2:31 PM, becky82 said: Advanced students have already studied the non-rare characters to exhaustion or near-exhaustion; there's only rare characters left to study. 100% agree. And the best way to prioritise which ones to learn is to read things that interest (or are otherwise relevant to) you rather than focusing on accumulating words from wordlists. 5 Quote
abcdefg Posted September 19, 2024 at 02:33 AM Report Posted September 19, 2024 at 02:33 AM On 9/18/2024 at 7:49 PM, imron said: I couldn't tell you how many characters or words I know because I don't see this as a useful metric to track. Anyone who has read much of what I've had to say over the years will know that I'm a strong proponent of doing the things you want to learn rather than just trying to accumulate various metrics. That's what I was getting at with my earlier comment. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.