anticks Posted August 27, 2008 at 10:08 PM Report Posted August 27, 2008 at 10:08 PM A while ago i saw someone post up a word list (or link to one) in an Excel spreadsheet. It was the 3000 most common words. Does anybody know where i can find this? Search a program called pocket scholar.. its similar to supermemo (i think) but free and it uses txt or csv excel files, also comes with a converter. Thanks in advance for anyone who can help with that spreadsheet. Thanks A Quote
renzhe Posted August 27, 2008 at 10:17 PM Report Posted August 27, 2008 at 10:17 PM I'm not sure about the 3000 most common words, but there is a vocabulary list for HSK, with definitions here on this site. It is likely a good approximation for most common words, and you can download them as CSV files, and load those into Excel. Levels 1+2 will come to around 3000, and they are all very common and important words. Quote
roddy Posted August 27, 2008 at 10:24 PM Report Posted August 27, 2008 at 10:24 PM Quick search for '3000' turns up this - characters rather than words though. Was that what you were after? Quote
c_redman Posted August 31, 2008 at 06:19 PM Report Posted August 31, 2008 at 06:19 PM I just made a word list out of the Lancaster Corpus of Mandarin Chinese. One obstacle to making word lists is that the source texts need to split into separate words. It can be automated to about 95% accuracy, but to be fully correct requires human intervention. The Lancaster Corpus has done this with about 1 million words worth of texts. However, another trap is that after the first few hundred words, you meet the long tail, where dozens of words have the same frequency. Grab one extra article on the Olympics, for example, and "奥运会" shoots up hundreds or thousands of places. So the choice of source material becomes important. In my Lancaster Corpus list, you will see that 苏联 (Soviet Union) is ranked #923. One category of their texts is news and press releases, most of which were between 1990-92. But with that caveat, I hope you find it useful. Quote
ABCinChina Posted September 1, 2008 at 08:16 AM Report Posted September 1, 2008 at 08:16 AM Here's the HKS word list renzhe was talking about which includes a little less than 3000 characters and around 6800 words. It is spread out into 4 categories with 1 being the easiest. You pretty much have to memorize this whole list if you ever want to be proficient at Chinese. HSK List.rar Quote
ipsi() Posted September 1, 2008 at 10:48 AM Report Posted September 1, 2008 at 10:48 AM (edited) EDIT: Sorry, my bad - didn't see the the fact that you've got them separated in 4 books (Or whatever they're called, I forget). Good stuff. I'm not sure if that's supposed to be the whole list or what? There's a total of just over 2000 rows (2018), which doesn't look much like 6800 words to me Edited September 1, 2008 at 10:59 AM by ipsi() Quote
ABCinChina Posted September 1, 2008 at 12:40 PM Report Posted September 1, 2008 at 12:40 PM To make things clear for everyone, there are 4 worksheets total with the characters and words mixed up. These are sorted alphabetically by the pinyin. I got a few hundred more to go on #2 before I advance to level 3. Worksheet 1 has 1033 characters & words. Worksheet 2 has 2018 characters & words. Worksheet 3 has 2202 characters & words. Worksheet 4 has 3571 characters & words. Quote
roddy Posted September 1, 2008 at 12:45 PM Report Posted September 1, 2008 at 12:45 PM I'd be careful about how you think about characters and words here - those lists aren't characters + words, they're words. Some words may be single-character, but thinking on them as characters could confuse the issue. This might explain a little. Or perhaps I'm being pedantic. (sh) Quote
ABCinChina Posted September 1, 2008 at 01:00 PM Report Posted September 1, 2008 at 01:00 PM (edited) I think that single-character words can also be classified as characters even if they are words by themselves. (Am I right?) Anyways, learning words are much more important than learning characters by themselves. After all, if one just starts studying the 3000 most common characters, then he/she will eventually realize that he/she can still not communicate very effectively without words. Edited September 1, 2008 at 01:15 PM by ABCinChina Quote
anticks Posted September 1, 2008 at 11:03 PM Author Report Posted September 1, 2008 at 11:03 PM True. ABC you meant to say 3000 words, and 6800 characters right? I only found 2018 rows in that spreadsheet. Thanks anyway all. a Quote
renzhe Posted September 1, 2008 at 11:11 PM Report Posted September 1, 2008 at 11:11 PM I think he meant 6800 multi-character words, using around 3000 unique characters. And there is more than one table, you were only looking at the 1st level. There are four. Quote
Guest realmayo Posted October 5, 2008 at 08:06 AM Report Posted October 5, 2008 at 08:06 AM Okay, I had a little play with the spreadsheet: I wanted a list of all the characters you should recognise at each level. Some of the characters are "words" in their own right, in that they stand alone very easily. Others only exist (for HSK, at least) in a bound form with another character. Stats: A: 805 B: 798 C: 590 D: 669. This means for example that to recognise all the words and characters for level C, that haven't already been seen in levels A and B, you need to learn 590 characters. Three caveats: 1) I might have made mistakes (especially for A, which I believe should total 800). 2) Probably there are characters in, say, lists A B or C which only occur in "bound form" as part of a two-character "word", but which then appear in list D as a standalone character. I haven't taken this into account. 3) It may or may not be useful to learn certain characters on their own. Fuller stats (which may be even more wrong of course) : Unique Characters / Total entries / Of which single / Of which multiple / Bound only A 805 / 1033 / 453 / 580 / 352 B 798 / 2018 / 559 / 1459 / 239 C 590 / 2002 / 441 / 1561 / 149 D 669 / 3571 / 457 / 3114 / 212 So it appears that for A, for example, there are 453 characters which stand alone, but a further 352 which only appear as part of a multi-character word. Don't know how useful that is ... I just want to be able to work out how which characters & words to learn to "complete" the different HSK levels. Quote
anticks Posted October 5, 2008 at 08:20 PM Author Report Posted October 5, 2008 at 08:20 PM thanks Was able to convert this list to xml and use in pocketfullRecall Quote
ABCinChina Posted October 11, 2008 at 02:27 AM Report Posted October 11, 2008 at 02:27 AM Realmayo, I think you're giving us a case of "Analysis Paralysis". Quote
Guest realmayo Posted October 23, 2008 at 08:09 PM Report Posted October 23, 2008 at 08:09 PM Hehe, you could be right but it's always fun to check one's progress, see how much further to go, right? Anyway, just a warning about the file that is linked to earlier, called HSK List.rar. It has quite a few mistakes in it: a few wrong tones, but also some completely wrong translations of words. The most recent one I found was: 报酬 translated as "revenge, avenge" but according to Wenlin 报酬 means "reward, renumeration, pay". It is in fact the similarly-sounding 报仇 which means "revenge, avenge". I still think the list is really useful but I'd treat it with caution. I double check every definition. I'd also like to know where it came from ... is it "official" in any way? Quote
rikesh Posted February 1, 2009 at 08:38 AM Report Posted February 1, 2009 at 08:38 AM Using the information and the list provided, I have created a macro that is helping me to learn HSK 1-2. Sharing with whoever . It is still in the infancy. I will be adding more stuff later. Vocabulary_v1.xls Quote
Ed Log Posted March 22, 2009 at 07:39 AM Report Posted March 22, 2009 at 07:39 AM does any one have a list of the most commonly SPOKEN words. I am just trying to learn enough to get by the frequency of word use in the spoken language is quite different from the written language. Quote
Don_Horhe Posted April 30, 2009 at 08:22 AM Report Posted April 30, 2009 at 08:22 AM Have you checked this out: http://en.wiktionary.org/wiki/Appendix:HSK_list_of_Mandarin_words ? A total of 8840 words and 2856 single characters. Quote
renzhe Posted April 30, 2009 at 12:26 PM Report Posted April 30, 2009 at 12:26 PM That's exactly the same list that realmayo is talking about (and which is available here on this site and a number of flashcard programs). As far as I can tell, that's the single most reliable corpus for learners of Chinese out there. It's probably a good idea to learn all of that (pretty much all of it is important) and then get the spoken vocabulary from spoken materials (TV-shows, movies, radio, podcasts, etc.) Quote
rimmhou Posted May 25, 2009 at 01:21 PM Report Posted May 25, 2009 at 01:21 PM great, I download the excel file. If i learn all the words in this list, which level of HSK I can get? Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.