Jump to content
Chinese-Forums
  • Sign Up

Anki Deck: 1000 Highly Common Chinese Words From News Articles


chajadan

Recommended Posts

Hi chajadan,

That's a good initiative! From experience however, the meaningful information in a new article lies in the relatively rare words. So if you learn the vocab that appears in 85% of the news, you most probably miss the point and won't understand. Let's dig into this:

  • 1. 的 (4.29%, Total 4.29% )
  • 2. 在 (0.89%, Total 5.18% )
  • 3. 了 (0.63%, Total 5.82% )
  • 4. 是 (0.55%, Total 6.37% )

The most frequent are the most common ones and won't be an issue for most people that achieved the level where they want to read a news article.

On the other hand, 600 words of this list occur only once and among them you find a lot of Proper Nouns (people, cities) as well as characters that the segmentation tool left behind.

For instance, 992. 民进 is an abbreviation for 中国民主促进会. (China Association for Promoting Democracy)

For instance: if the index used for segmentation doesn't know 欧巴马 or 奥巴马 (2 versions of "Obama" in Chinese), it will show up in the frequency list as 欧, 巴, and 马 which are very rare on their own in the news.

In both cases, let's imagine you start learning with the beginning of the list, let's say the 300 first ones. You may now read the beginning of an -unreal- article that states:

President ??? met ??? to celebrate the ??? of ??? occured in 1945.

Can you read? Yes! Most of the sentence, in fact.

Can you understand? No! The meaningful information is too rare and you haven't encountered these words in the frequency list.

Now, let's be extreme, and learn only the very rare words:

??? Obama ??? Yan Junqi ??? ??? ??? foundation ??? China Association for Promoting Democracy ??? ?? 1945.

Can you read? No!

Can you understand? Probably! 2 people, an event, an organisation, a date. You've been screening the news. Exactly like in your mother tongue, no need to read every single word.

The conclusion is that for the "1000 Highly Common Chinese Words From News Articles", you can use this list both ways:

- From the beginning, for General chinese.

- From the end, for News reading, if you can correct the segmentation-related errors.

And the real challenge is to find the right balance between words that are very common in general Chinese and words that are very specific but compulsory if you want to g beyond the general-Chinese intermediary state.

  • Like 4
Link to comment
Share on other sites

I always wondered, what's the benefit of learning words out of context?

SRS can be helpful if you use it in conjunction with a textbook/podcast/TV series/whatever, to help you retain what you've already experienced. But I really doubt you'll be able to learn much with just a flashcard deck alone. I mean, it's not that different from trying to learn a language by reading a dictionary, page by page.

  • Like 1
Link to comment
Share on other sites

yes I agree with 淨土極樂, I struggle with memorising some words from HSK as I don't know the context. I am laboriously going through ArchChinese / my grammar text book sentences / PLECO / Chinese Pod etc and back filling by flash card list

I do think its good to have several different example sentences from different sources as some are very factual in nature (like a grammar book) and some are more colloquial (e.g. Chinese Pod)

Link to comment
Share on other sites

I always wondered, what's the benefit of learning words out of context?

Strongly depends on the words and your level. Many words have/need very little context. Think standard road signs (stop, no parking, etc etc) But also common words on a menu tend to have little context and some rote learning before a trip is very handy, especially for those that are not adventurous enough to just point out random items on the menu.

Also when your general level is fairly decent and you want to get into a new subject learning some key vocabulary can be very beneficial as comprehension levels may drop dramatically when you get from general subjects to specialist subjects. When already familiar with the subject the specialist vocabulary in the new language will require little context. Many nouns and verbs are quite well defined and have fairly good one on one translations. Admittedly some interpretation will often remain. But then, many natives will also have somewhat different interpretations of where to draw the line between a lorry, a car, a (mini) bus, a truck, a van, etc etc

Rote learning a well chosen (frequency based) vocabulary list is a completely different beast from learning a dictionary page by page. Learning a well chosen vocabulary list can absolutely kickstart progress.

Nevertheless I absolutely agree that it usually is far better to learn words in context. Apart from that context may help to memorise, specially the more abstract words need context to understand/learn all the nuances properly.

  • Like 1
Link to comment
Share on other sites

As far as I'm concerned, the benefit to this deck over HSK is that HSK is not a list of the very most common words in order, and this tries to be that. Many people study frequency lists in languages. HSK is an introductory set of vocabulary that still involves some less than common vocabulary, and there are many more words to learn in HSK. This is a small, focused set that is simply a stepping stone to one of my very favorite ways of learning: reading with a dictionary. After knowing this deck which can be quickly learned, your can be off and running reading online news.

I suppose everyone learns differently. I would never want to have flashcards removed from my arsenal, and it hinges on a skill I'm good at: intentional memory. And I prefer to learn isolated vocabulary over sentences. But certainly I do not expect anyone to learn a language by this method alone, the context 淨土極樂 and Johnny20270 wonder about. My main way of learning the bulk of a language is simply to read books, newspapers, whatever, with a dictionary. I don't hear about such an approach being employed by many people, but I like it a lot. This lists helps get you off the ground with such a method since it cuts down your dictionary lookup load by up to half.

When I started learning Spanish on my own in 2006 I started with a newspaper and had to look up easily 50% of words I encountered (even after 8 years of foreign language study is romance languages (2 in Spanish specifically) earlier in school), but you can only look up a word so many times before it sticks, and by now, I've read so many different novels, including a 1400 page novel in Spanish (Los Pillares de la Tierra). This habit of reading exposed me to such a wide and varied vocabulary. It also had me saying all this stuff in my head so often, that it completely opened the door to being able to hear spoken Spanish better.

My basic structure for learning a language is this:

- learn the sounds

- learn some simple vocabulary

- learn basic grammar

- begin reading, reading, reading (ignore what you don't get if necessary)

- study new grammar areas alongside to help you parse what you read

- flashcard vocabulary you've encountered and don't yet know (helps avoid multiple lookups while reading)

- after getting to the point where your reading is at a normal pace with good comprehension, pick up audio sources and begin to listen

I adore this method personally. And towards this method, a frequency list makes tons of sense to get off the ground quickly. In this method, I would almost (though not entirely) replace HSK lists with simple reading. The HSK lists could be used as good tests to show the progress of your learning through reading. And mind you, none of this is a very quick process. The 1000 words in this Anki deck will be the fastest part by far. By for me it is highly effective.

--charlie

Link to comment
Share on other sites

As far as I'm concerned, the benefit to this deck over HSK is that HSK is not a list of the very most common words in order, and this tries to be that. Many people study frequency lists in languages. HSK is an introductory set of vocabulary that still involves some less than common vocabulary, and there are many more words to learn in HSK.

I've never compared HSK with a general frequency list but I suspect you're right. I also agree that HSK, at the higher levels, contains words that seem not particularly usefull. But then, is it usefull to religiously follow a 'random' word frequency list? IMHO not, as at the beginner level you will not use words according to that frequency list. Beginners tend to limit their communication to certain subjects and consequenly encounter completely different wordfrequencies. That HSK has many more words to learn is irrelevent. You need to learn far more words then 1000 and also the 5000 to 8000 from new/old HSK are insufficient for a good level of Chinese. Also, HSK is divided in levels so you can dose your vocabulary learning accordingly.

This is a small, focused set that is simply a stepping stone to one of my very favorite ways of learning: reading with a dictionary. After knowing this deck which can be quickly learned, your can be off and running reading online news.

I agree in general with your method but I would not recommend to start with reading a (online) newspaper. I would start out with some simple material. Reading material aimed at learners or children and extend from there. To read somewhat comfortable you need a fair comprehension rate that imho is well above the 50% you claim to reach with your anki deck. IMHO newspapers have just too wide a subject range and in general too complicated language use for a real beginner. For most learners it's just a recipe for frustration to start this method with newspapers while imho it can be a great way to study and to motivate when starting out with suitable easy material.

Link to comment
Share on other sites

Myself, I'm a devout language studier and I feel no need to read at a high comprehension rate from the beginning in a language brand new to me. I'd rather have contemporary examples in topics of import at the moment, even if it takes me two hours to get through a few paragraphs, and for me this is not frustrating but understandable and rewarding. I do very much realize my method is not appropriate for the general audience, but also, in high school I was one of the extremely few people who learned my math from the textbook whereas others simply turned to page x and did problem y upon the instructor's prompting without ever reading the instructional material that a textbook is mainly there to be. I it surely a revelation when you realize the book does what the teacher was doing, but at the pace of your choosing.

I appreciate the views of others on suitable learning methods. The deck is made available for those who would find it useful.

--charlie

Link to comment
Share on other sites

I don't see anything wrong with rote memorizing, for example, character pronunciation. And lots of words (most simple nouns) don't need context. Do I really need context for 'pen' or 'dog' or 'cellphone'?

Anki, when used correctly, can speed up your learning and ensure you don't forget. Anki is great if, for example, you don't have time to read/practice speaking every day, but you can do Anki for 10-20 minutes. The Anki can increase your vocabulary little by little and make sure you remember 80-90% of what you learn for YEARS. How's that not good?

10 new words a day comes out to 3500 in a year assuming you skip a few days. And with 10 new cards a day and less than 100 reviews, you can do all that in 10 minutes. Why not?

Personally, I think a deck of the 1000 most common is a GREAT idea. Someone who knows a few characters and understands the basics could cram the whole thing in a week and be ready to go. Sure, they aren't going to know the nuances of each word, but that takes years of native-level exposure.

  • Like 2
Link to comment
Share on other sites

some are more colloquial (e.g. Chinese Pod)

ehh Chinese Pod is not colloquial. If you want to hear what colloquial Chinese sounds like, go watch 家的N次方 or the contestants on 金牌调解. Also that one show in the northeast with the farmers in the village, don't remember the name.

Link to comment
Share on other sites

And lots of words (most simple nouns) don't need context. Do I really need context for 'pen' or 'dog' or 'cellphone'?

Because an HSK deck (what most Anki users seem to be using) is not pencils and dogs.

10 new words a day comes out to 3500 in a year assuming you skip a few days. And with 10 new cards a day and less than 100 reviews, you can do all that in 10 minutes. Why not?

E.g. 深刻 and 深厚 have clear differences in usage while their English dictionary definitions are mostly the same. Anki won't teach you this.

Basically, my point is you need some other way of experiencing the language (be it a textbook or immersion for more advanced learners) for the flashcards to be useful. Flipping through CC-CEDICT's entries on your phone 10-20 minutes a day won't teach anything.

Link to comment
Share on other sites

Anki is great if, for example, you don't have time to read/practice speaking every day, but you can do Anki for 10-20 minutes. The Anki can increase your vocabulary little by little and make sure you remember 80-90% of what you learn for YEARS. How's that not good?

10 new words a day comes out to 3500 in a year assuming you skip a few days. And with 10 new cards a day and less than 100 reviews, you can do all that in 10 minutes. Why not?

No question Anki is great tool for learning but the problem I find is that I there is no way I could learn ten cards a day for a year and remember them all, especially in 10-20mins. After a month I am struggling not to forget the cards and new cards start to pollute my knowledge of older cards. But that is just me. . Furthermore. if I memorise so many words from HSK list their meaning is not clear at all to me and I find I am "running ahead" on vocabulary at the expense of learning context. My ANKI decks take at the very least an hour.

Similar to many subjects I think lots of people who excel at language learning, (like you Charlie perhaps?) have a natural ability. Although I like learning Chinese I find that I constantly struggle at it much much more that I would at sciences.

Also (west texas), also yes on your comment about ChinesePod, I should use probably the word "informal" rather than "colloquial".

charlie

Thanks for the deck. I will have a look. I posted a link here (post #42) which is another word list taken from movie subtitles. Also seems a smart way to learning words

Link to comment
Share on other sites

I'm not advocating dropping other methods of study, but those ten words a day add up. Sure, you won't remember the nuances, but you will remember the gist of each word well enough to understand it in context. This lets you start reading native-level material faster, and that native-level material, IMO, is the only place you can really start to get a handle on those nuances.

the problem I find is that I there is no way I could learn ten cards a day for a year and remember them all, especially in 10-20mins.

You can remember them all, or most of them. Let me explain how the algorithm works: Basically, the interval at which you see each card is multiplied by a factor, call it N, each time you get the card right. Obviously N>1, the default being 2.5. As long as you are getting most of the cards right, you will only be seeing a small fraction of your total deck each day. Surprisingly enough, the number you see each day is only weakly proportional to the total number in your deck and is more strongly affected by how many you add each day. This means that you can have a huge deck but only see 1% of it each day.

As long as you do all the reviews every day and are paying attention, you should remember 80-90% of the cards. The Polish psychologist who invented the algorithm, an absolute genius, designed it to be the most efficient and effective way to memorize anything. It might seem like you are getting lots of cards wrong, but actually, most of the cards you will get right, and because their intervals increase, you seldom see them and spend more time on the few cards you have trouble with.

Link to comment
Share on other sites

Very true, but all this is dependant of someone memory recall ability. I think mine is not good at all.

I read the algorithms behind the SRS type software and yes its was a very smart idea indeed! My deck is currently around 800 hanzi cards and going for about 9 months now. My reviews are around a 100 a day (5 new cards) so about ~12%. For mature cards the recall rate is about 91% so around the guidelines ANKI suggest. I do try optimise the settings in ANKI as I can find that using the default times I can just iterate around 40 failed cards the whole day and total number of reviews for the day can be easily 400-500. This is much worse if I leave it for 2 or 3 days

I do have a strict "fail" text on a card, it I fail to get the tones, pronunciation, and English meaning correct I count it as a fail.

Memory tests are interesting. I tried several of these on-line memory puzzles with my fiend who is also a Chinese linguist. She has a fantastic memory, almost eidetic. The test was presenting 20 random objects were shown on the screen for 5 seconds or something and after 20 secs you had to select from a separate large list objects that were part of the subset. She scored 29 out of 30. My reaction was "eh? test? What test are you on about?" :D Joking aside, I scored 17 and by these result she was in the top 5-10% of the population, whereas I was well below the mean, mean was 22 I think

Nevertheless when we tried the typical IQ tests on line (i.e. MENSA test) she only scores slightly above average where as I come much higher. Not meaning to sound arrogant but just highlighting my much stronger inclination towards logical reasoning.

(On a separate note, I think IQ tests are a lot easier for someone with a strong mathematical background. You immediately look for patterns and inequations)

Anyway the conclusion? Language is maybe not for me? :mrgreen:

Your point on doing the reviews each day is crucial I believe. Also, as you rightly mention, you spend time on the wrong ones which can cause a good deal of "grumpyness" (if there was such a word!)

Link to comment
Share on other sites

Similar to many subjects I think lots of people who excel at language learning, (like you Charlie perhaps?) have a natural ability. Although I like learning Chinese I find that I constantly struggle at it much much more that I would at sciences.

While I am in many ways intellectually gifted and cannot take credit for that, I still agree that language learning is an innate human trait, and that with adult learners it is more exposure and time (the 10,000 hour rule) that will determine the degree of learning achieved with a language. Babies learn to speak a full language on their own, but it still takes them years and they are doing very little else while only being exposed to one language with a fresh slate that avoids having to retrain burned in habits.

I would recommend Johnny20270, that you put your condition into its proper perspective. I could easily say that although I love language learning and am generally good at it, that Chinese is much more slippery to pick up at first, but when I think about, life has been busy with me lately, and the extra learning curve of chinese characters has kept me from learning how I normally would (lots of reading), so I can pick up a language with our alphabet much more easily, like vietnamese, but many people might blame their ability to learn instead of the other true factors. What I am saying is, to learn a language well is regularly a multi-year kind of thing, easily surpassing a decade of study unless you don't have a job or other hobbies, so saying that you don't feel you learn language well makes me wonder how you type all those sentences. I worry people add stumbling blocks of discouragement to their process by how they view it -- it seems the "natural ability" people, if one they have at all, is to be comfortable with the discomfort and blind groping that occurs before the "reward". And Johnny, if later you have a hard time remembering your Anki terms, I would surely and highly recommend that you decrease the wait intervals. I'm a very good memorizer and the default increases seem a bit long to me, but I can work with them.

But now, after reading all your posts, it's funny, you make it sound like Anki would never work for you, but you have a 9-month old deck with 91% retention on learned terms: well geez buddy, which is it? I guess you object to the 10-20 minutes parts. I am normally like you Johnny, with a strict fail criteria, but with this deck, I'm only doing Hanzi prompts and leaving out anything except: see Chinese word in Hanzi, say it, know a couple of it's most common defns, and that's it. Normally I have 3 cards per note, prompting for definition and pinyin too. With this deck, I'm going for quick gathering so I can build to reading more quickly.

--charlie

Link to comment
Share on other sites

Weird Thomas_nulinuli, your post, while second, just showed up for me now (got an e-mail notice 7mins ago).

What you say for sure makes sense. Personally, I do ~not~ expect reading to be quick and without a lot of dictionary use after I go through this list. I've already done some reading and I'm used to living in a dictionary as I go. This deck is just to optimize the number of things I need to look up. Though the mdbg.net annotator keeps me from having to look up much individually (and is visually the most pleasing annotator I know, though dimsum chinese tools works for me for pasted text).

Link to comment
Share on other sites

Wow, lots of important topics being touched on in this thread.

E.g. 深刻 and 深厚 have clear differences in usage while their English dictionary definitions are mostly the same. Anki won't teach you this.

Using Anki for rote learning of isolated words is only useful for receptive vocabulary, not productive. If you learn both 深刻 and 深厚, and you encounter one of them in a sentence, you would be able to read it. But if you ventured to use one of them to produce a sentence, that learning wouldn't tell you which one to use. But I do agree that such a study method is only useful when combined with other exposure to the language. I assume everyone is doing that already, since the main goal is to use the language for practical purposes.

Link to comment
Share on other sites

But now, after reading all your posts, it's funny, you make it sound like Anki would never work for you, but you have a 9-month old deck with 91% retention on learned terms: well geez buddy, which is it?

No, I should clarify then, ANKI definitely does work! The concept of efficient memorization by timely intervals is quite smart and credit is well due to the creators of the algorithm. However I find that I have a lot of difficulty memorising facts I don't understand nor asked to 'accept' at face value. Some people are much better at this than I. For example, if I am presented with a completely made up shape and told the meaning is, ... dunno table, I would have a lot of difficulty (more than others I envisage) in memorising this (albeit) nonsense fact.

Hence my point is: for me, understanding is important aspect in an ANKI deck

I have an on-target retention rate for my decks but the number of review is exceptionally high and fails can by well into the hundreds. This is because the retention rate is based on mature cards. Thus as I see, it it suggests that I have trouble transitioning from a young card to a mature card. I recently changed the wait times (as you noted!) quite a lot and it seems to make a lot of different. Time will tell.

I do think that users need to tweak "wait times" to their own memorisation patterns. Another point worth noting is that the ANKI algorithm (I assume) assumes no other influence on your memorisation patterns. I am assuming that if a card is presented on say day 1, 3, 5, 10 etc it does not assume that you have seen this card in between these wait times. As your studies will invariably involve reading, one may very well come across this card several times during the ANKI "wait period". Herein lies the problem. I think I am running ahead with vocabulary and not doing enough reading. You mention that you read a lot so I guess this points to the fact that you might see these cards during the ANKI "wait period" whereas I do not. The solution for me: perhaps it means that I need to read a lot more as you suggest and /or adjust the ANKI wait settings up down. Actually its a self defeating, as the more I spend time on ANKI reviews the less time I have to read. :D

I don't think we, nor anybody else for that matter equating memorising to learning. Although for some cases, there is not much to learn, like in the case of simple nouns. Even then, a danger lies by extrapolating wrong information. For example if I see a simple noun like 朋友, and 朋友们 I might think, ok well now I can say 狗 and 狗们 .

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...