rezaf Posted December 5, 2011 at 05:00 AM Report Posted December 5, 2011 at 05:00 AM As I mentioned above I also agree that 20,000 of those 40,000 words are probably not important at all for reading general contemporary novels but are probably very important in specialized fields of study for example in medicine, politics, etc and that’s where doctors and lawyers make their money from. Another thing that I should explain is that what I meant by word includes all forms of compound words and expressions. In that case things like 保管,保管费,保管员,保管室 or 保守,保守疗法,保守党,保守派,保守主义 can all be counted as individual words. I know that it probably doesn’t make a significant difference for people who just want to read Chinese books but it’s necessary to memorize them as individual words if one wants to be able to use them in writing and speaking which is the last step that one should take to achieve fluency. Finally I think it’s not entirely correct to say because some words are in last 2% of the frequency list, not knowing them just reduces comprehension by 2%. In my own experience more difficult words or expressions usually contain a lot of compact information and sometimes a few of them can significantly affect the comprehension of a text. Quote
jkhsu Posted December 5, 2011 at 05:34 AM Report Posted December 5, 2011 at 05:34 AM Finally I think it’s not entirely correct to say because some words are in last 2% of the frequency list, not knowing them just reduces comprehension by 2%. I think this depends on the type of reading material. For a full length novel, that 2% might not be a big deal because many of those words are most likely descriptive type words. You'd probably have to miss more than 2% of the words to miss the plot of a novel. However, if you are reading a succinct news story, that 2% can be a big deal. Quote
rezaf Posted December 5, 2011 at 05:40 AM Report Posted December 5, 2011 at 05:40 AM That's probably true as my reading is mostly limited to newspaper articles. Quote
Lu Posted December 5, 2011 at 01:18 PM Report Posted December 5, 2011 at 01:18 PM I agree with rezaf, even on novels. Sure you can do without a description of exactly how lofty a mountain is, but after 100 pages of missing how uncomfortable exactly someone looks or what exactly is implied when she looks him in the eye, the plot slowly starts to slip. At least that's how I'm feeling with the book I'm currently reading. I get almost everything, I maintain decent speed (aiming at 10 pages a day and often making that), I enjoy it even, but I'm starting to miss stuff. I'll probably make it to the end of the book and I'll probably get most of what's going on without looking up more than a few words, but I know there is more to enjoy in this novel. Quote
jkhsu Posted December 5, 2011 at 06:12 PM Report Posted December 5, 2011 at 06:12 PM The 98% comprehension rate has been discussed before. Please look at this thread: Extensive Reading and Vocabulary Post #14 has a link to a research article: Unknown Vocabulary Density and Reading Comprehension, Marcella Hu Hsueh-chao and Paul Nation Excerpt from the article: ----- CONCLUSION This study shows that the density of unknown words has a marked effect on text comprehension. The text used in this study was a fictional text with a strong chronological story line and was thus not a "difficult" text. Other text types, particularly newspapers and academic texts, would place greater demands on the reader. However, even with this reasonably easy text, most learners would need around 98% coverage to gain adequate unassisted comprehension of the text. This provides experimental support for the position taken by Hirsh and Nation (1992), namely that learners need to have around 98% coverage of words in the text to be able to read for pleasure... ----- It's interesting that my own post #6 in that thread explained how I thought this 98% didn't apply to newspapers, which I've said here as well. Quote
roddy Posted December 5, 2011 at 06:22 PM Report Posted December 5, 2011 at 06:22 PM Even if unknown words don't affect your understanding of the text, they slow you down something awful. Quote
imron Posted December 5, 2011 at 08:20 PM Report Posted December 5, 2011 at 08:20 PM I thought this 98% didn't apply to newspapers I disagree. Instead, if you're having difficulty understanding the content it's more than likely that comprehension has fallen below 98%. In that post you are talking about characters, but when talking about comprehension levels for this type of exercise, you really need to work at the word level because you may know the characters, but not the word and that will affect comprehension. Anyway, here are the first few paragraphs from a random article from tech.sina.com.cn. It contains slightly over 100 words (note, words, not characters). Show me two words (total, not unique) you could take away from this that would seriously affect understanding (feel free to substitute with your own article if you think this is not representative). 移动应用开发者的年收入普遍偏低,34.7%的开发者收入在1万元以下,1万-5万元的只有16.3% 从创业的那一刻开始,李万鹏就亲手为自己的业余生活画上了句号,每天的工作几乎都要忙到凌晨一点。 让李万鹏如此费心的,是一家名为成都优聚的手机游戏开发公司。他目前的职位是这家公司的总经理,每个月拿3000元左右的工资,“现在已经好多了,去年每个月只能拿到2000多,有两个月甚至发不出工资” 。 Even if unknown words don't affect your understanding of the text, they slow you down something awful. So true... Quote
Silent Posted December 5, 2011 at 09:44 PM Report Posted December 5, 2011 at 09:44 PM I disagree. Instead, if you're having difficulty understanding the content it's more than likely that comprehension has fallen below 98%. I think it's not that black and white. Most of the time there's no real issue if you loose a few words. Most of the time things are said more than once. Sometimes however a single word is crucial. Sometimes a single yes or no means a huge difference for the interpretation for large amounts of text. I think the 98% is just a very rough average. I think key is that with 98% comprehension you're able to guess most of the remaining vocabulary or at least are able to make a good judgement on the relevance of the unknown vocabulary. Consequently you're able to liberate yourself (mostly) from the dictionary and become able to concentrate on the content instead of the vocabulary (packaging). Quote
c_redman Posted December 5, 2011 at 11:03 PM Report Posted December 5, 2011 at 11:03 PM Not knowing what 画上了句号 meant, I assumed 李万鹏 was supplementing his low income by somehow painting punctuation symbols, and the added work kept him busy at all hours. If the text were longer, I probably would have suspected this was wrong, as his painting career would have been strangely never mentioned again. But it does illustrate Silent's point that sometimes words are crucial to comprehension in an unpredictable way. It also raises another point, in that it isn't just about unknown words, but misunderstood words, which leads to misunderstanding the broader text. 1 Quote
jkhsu Posted December 5, 2011 at 11:18 PM Report Posted December 5, 2011 at 11:18 PM I disagree. Instead, if you're having difficulty understanding the content it's more than likely that comprehension has fallen below 98%. I had a feeling you were going to post a sample article. In general, I do think that 98% word comprehension is enough to get the gist of most types of text, including news articles. However, it is typically in news articles that not knowing a few words "can" affect one's comprehension. Again, these are more corner cases than the norm. I provided an example from the Wall Street Journal in English below. The total number of words in that article is around 277 or so and I didn't break them down by unique words. I took away 4 words (some of these words are terms with multiple combined words and plural forms) and thought it made the article pretty confusing. I would venture to say that if you didn't know just one word (Word #1), you wouldn't completely understand the article. Title:(Word #1) [(Word #2) Debt Regulation Doesn’t Tackle the Problem European Union lawmakers agreed late Tuesday to legislation that will limit the ability of traders to bet against (Word #2) using (Word #1). Restrictions on “naked” (Word #1) (Word #2) that are not used to (Word #3) a correlating exposure will mean it will “no longer be possible for a (Word #4) to purchase (Word #1) of (Word #2) debt without holding actual bonds of the countries involved,” said one EU lawmaker. Attempts to restrict (Word #1) trading are often characterized as shooting the messenger. If (Word #2) in the euro zone, or anywhere else, manage their finances well and don’t give markets reason to think they will default, speculators won’t speculate against them, the reasoning goes. But if legislators want to stop people buying (Word #1), they haven’t yet come up with a plan to stop investors who own (Word #2) bonds selling them. And if the trading activity in French bonds and (Word #1) that followed Moody’s comment that it would monitor the stable outlook on the country’s triple-A rating is anything to go by, that might be more useful. The cost of French five-year (Word #2) (Word #1) did rise Tuesday after Moody’s comments, but the move was not dramatic and, with a more optimistic tone pervading financial markets this morning, has already been more than fully reversed. Meanwhile, investors sold French bonds yesterday and pushed French 10-year yields higher, driving the spread between them and safe-haven German bunds to its widest level since 1992. "The (Word #1) market has held in well considering the size of the move in cash bonds," said one trader. So it seems that investors can find ways to express their views on different (Word #2), with or without the help of the (Word #1) market. Here are the removed words: Word #1: CDS (Credit Default Swap) Word #2: sovereign Word #3: hedge Word #4: hedge fund Article source: link I'll let other's try to remove two words from your Chinese news article as I'm not at that level of comprehension yet. Quote
imron Posted December 5, 2011 at 11:19 PM Report Posted December 5, 2011 at 11:19 PM If the text were longer, I probably would have suspected this was wrong I would argue that the text is long enough for someone to suspect this is wrong (and therefore reassess assumptions). Directly after 每天的工作几乎都要忙到凌晨一点 is 让李万鹏如此费心的,是一家名为成都优聚的手机游戏开发公司。他目前的职位是这家公司的总经理. It's unlikely that the general manager of a mobile game developer is going to be busy until 1 in the morning painting punctuation marks Quote
imron Posted December 5, 2011 at 11:23 PM Report Posted December 5, 2011 at 11:23 PM I took away 4 words By my count, you took away 20 words, which brings comprehension down to 92-93% Quote
jkhsu Posted December 5, 2011 at 11:36 PM Report Posted December 5, 2011 at 11:36 PM By my count, you took away 20 words, which brings comprehension down to 92-93% What happens when a word is used multiple times in the text? The article was centered around words #1 and #2. If you didn't know the meaning of those two words, you wouldn't fully comprehend the article. Quote
imron Posted December 5, 2011 at 11:59 PM Report Posted December 5, 2011 at 11:59 PM What happens when a word is used multiple times in the text? You remove it multiple times, and each one of those times counts towards the total you're allowed to remove. If you're doing a word-count based on individual words (including repeats), then that's the way it needs to work in order to have a valid calculation. So yes, if you remove all instances of #1 and #2 then you'll go over the limit, meaning you can't remove both of them and have comprehension above 98%. If you only remove one of them (to get within 98%), then it's quite easy to understand the article. Actually for reference, I didn't think the article was that hard to understand anyway. This is what I had thought for each of the words after reading it: #1 - some kind of financial instrument, (actually, I think this is accurate enough, because even in English, if I see the word Credit Default Swap, I have no idea what the specifics of it are, and would just treat it as "some financial instrument"). #2 - government - probably close enough to the actual meaning that it doesn't impact understanding (from "..If (Word #2) in the euro zone..." and knowing that it was somehow related to 'country' from "..(Word #2) debt without holding actual bonds of the countries involved.." #3 - hedge - this one I got completely correct just from context. #4 - some word referring to some kind of trader. From context "will limit the ability of traders to bet against.." followed by "..it will “no longer be possible for a (Word #4)"... So I don't think even with those words removed it severly impacted my ability to understand that text. Quote
jkhsu Posted December 6, 2011 at 12:11 AM Report Posted December 6, 2011 at 12:11 AM #1 - some kind of financial instrument, (actually, I think this is accurate enough, because even in English, if I see with word Credit Default Swap, I have no idea what the specifics of it are, and would just treat it as "some financial instrument"). This is the problem. With news articles, if you're fine with Credit Default Swap = "some financial instrument", then we have different standards of comprehension! If you had said some type of "financial insurance or hedge" then I think I'd be ok with it. Anyways, I think there were a few others who thought 98% comprehension was not acceptable. I didn't. I just thought news articles had more of a chance to fall into the difficult to comprehend category when one misses just a couple of words. Quote
roddy Posted December 6, 2011 at 12:12 AM Report Posted December 6, 2011 at 12:12 AM The article was centered around words #1 and #2. If you didn't know the meaning of those two words, you wouldn't fully comprehend the article. Are you trying to tell us that if the words that aren't understood are the important ones, understanding will be reduced? Because that seems blindingly obvious, and I hadn't noticed anyone saying otherwise. Quote
imron Posted December 6, 2011 at 12:20 AM Report Posted December 6, 2011 at 12:20 AM if you're fine with Credit Default Swap = "some financial instrument", then we have different standards of comprehension! If you had said some type of "financial insurance or hedge" Perhaps financial instrument is the wrong term, but to be honest, as a layman to the world of finance, if I'm reading an article like that I just think "some financial thing, used by people in the finance world", and don't think any further than that, because that's about as much detail as I'm interested in. I'm sure I could find examples of things related to my field of expertise (software development) that were obviously different to me but that to a layperson would just be treated as "some computer thing". Quote
rezaf Posted December 6, 2011 at 02:00 AM Report Posted December 6, 2011 at 02:00 AM There are quite a few variables that should be considered in a test but I think What imron is suggesting is based on the theory that the frequency of a word is the only factor that determines its effect on comprehension. According to this theory not knowing the meaning of 是 in the first text and 处 in the second text both reduce the comprehension by 2.5%. 我和他_相亲认识,时间不久,但彼此都很满意,很喜欢对方。昨晚我刚见了他家的众多长辈,他送我回家的时候说,等双方家长见面,“我就可以把处男之身献给你这个小处女了”。听他这么说,我心里不_滋味,回家后短信告诉他我不是处女。他问我_和谁,几次。我告诉他_和大学男友,叫他不要问我细节,不然我会觉得他在侮辱我。后来他说,给他两天时间想想。如果他的处女情结过得去,就还在一起,他心里会当我_处女。如果过不去,就和平分手,他会替我保密,把分手的责任推在他身上。 我和他是相亲认识,时间不久,但彼此都很满意,很喜欢对方。昨晚我刚见了他家的众多长辈,他送我回家的时候说,等双方家长见面,“我就可以把_男之身献给你这个小_女了”。听他这么说,我心里不是滋味,回家后短信告诉他我不是_女。他问我是和谁,几次。我告诉他是和大学男友,叫他不要问我细节,不然我会觉得他在侮辱我。后来他说,给他两天时间想想。如果他的_女情结过得去,就还在一起,他心里会当我是_女。如果过不去,就和平分手,他会替我保密,把分手的责任推在他身上。 Personally I would read the first one without even noticing that something is missing but in the second one I would wonder for at least a few minutes about why the hell they want to break up. Is it because she is a trans? Quote
renzhe Posted December 6, 2011 at 11:20 AM Report Posted December 6, 2011 at 11:20 AM I'd like to second everything imron said in this thread, and I'd also like to mention the endurance aspect. Reading is work. The more you have to think, the more you have to interpolate and guess, and the longer it takes for you to grasp the meaning of a character, the more tired you will get. In the beginning, when I was starting with Jin Yong, my brain would switch off after a page. Any further reading would be pretty futile because you would be missing 50% of it, as your brain refuses to spend too much effort figuring out exact meanings. It gets better with time, but it takes quite a bit of effort to get to the point where you can read Chinese for a few hours without getting tired and can still comprehend as much as you did when you started reading. This is something you train, and learning vocabulary separately will not help you, the same way making a step in isolation over and over again will not help you train for a marathon. So, even if you can get through short articles after loads of vocabulary study, it doesn't mean that you would be able to get through a book comfortably. To read books, you need to read books, I'm afraid. 3 Quote
Guest realmayo Posted December 6, 2011 at 11:21 AM Report Posted December 6, 2011 at 11:21 AM Yes I'm not sure picking an article that involves words that most people would need an explanation for in their native tongue is the best example. Also I think the clozing is wrong in "Restrictions on “naked” (Word #1) (Word #2)" -- makes no sense. This 98% figure: this is on average, right? So occasionally you'll be unlucky and that 2% will cause problems. By picking and choosing the 2% you want to remove, you're skewing the balance of probabilities. I also think that giving someone an article in a foreign language, with 2% blanked out, and giving them an article in their own language, with 2% blanked out, will have different results ... even if they know the remaining 98% in both languages. This is because reading and understanding is not just a matter of knowing vocab: if you can read well in a foreign language, you can read comfortably and easily and still have brain-power to spare making connections and suppositions which allow you to understand what's going on, even if you lack 2% of the vocab. But if you lack that "reading fluency", I can see that 2% gap being unjumpable. Hence: I think the 98% discussion reinforces the gist of this thread that "reading fluency" is about more than vocab. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.