Jump to content
Chinese-Forums
  • Sign Up

Excel word list


Recommended Posts

Posted

A while ago i saw someone post up a word list (or link to one) in an Excel spreadsheet. It was the 3000 most common words.

Does anybody know where i can find this?

Search a program called pocket scholar.. its similar to supermemo (i think) but free and it uses txt or csv excel files, also comes with a converter.

Thanks in advance for anyone who can help with that spreadsheet.

Thanks

A

Posted

I'm not sure about the 3000 most common words, but there is a vocabulary list for HSK, with definitions here on this site. It is likely a good approximation for most common words, and you can download them as CSV files, and load those into Excel.

Levels 1+2 will come to around 3000, and they are all very common and important words.

Posted

Quick search for '3000' turns up this - characters rather than words though. Was that what you were after?

Posted

I just made a word list out of the Lancaster Corpus of Mandarin Chinese.

One obstacle to making word lists is that the source texts need to split into separate words. It can be automated to about 95% accuracy, but to be fully correct requires human intervention. The Lancaster Corpus has done this with about 1 million words worth of texts.

However, another trap is that after the first few hundred words, you meet the long tail, where dozens of words have the same frequency. Grab one extra article on the Olympics, for example, and "奥运会" shoots up hundreds or thousands of places. So the choice of source material becomes important. In my Lancaster Corpus list, you will see that 苏联 (Soviet Union) is ranked #923. One category of their texts is news and press releases, most of which were between 1990-92. But with that caveat, I hope you find it useful.

Posted

Here's the HKS word list renzhe was talking about which includes a little less than 3000 characters and around 6800 words. It is spread out into 4 categories with 1 being the easiest. You pretty much have to memorize this whole list if you ever want to be proficient at Chinese.

HSK List.rar

Posted (edited)

EDIT: Sorry, my bad - didn't see the the fact that you've got them separated in 4 books (Or whatever they're called, I forget). Good stuff.

I'm not sure if that's supposed to be the whole list or what? There's a total of just over 2000 rows (2018), which doesn't look much like 6800 words to me :)

Edited by ipsi()
Posted

To make things clear for everyone, there are 4 worksheets total with the characters and words mixed up. These are sorted alphabetically by the pinyin. I got a few hundred more to go on #2 before I advance to level 3.

Worksheet 1 has 1033 characters & words.

Worksheet 2 has 2018 characters & words.

Worksheet 3 has 2202 characters & words.

Worksheet 4 has 3571 characters & words.

Posted

I'd be careful about how you think about characters and words here - those lists aren't characters + words, they're words. Some words may be single-character, but thinking on them as characters could confuse the issue. This might explain a little.

Or perhaps I'm being pedantic. (sh)

Posted (edited)

I think that single-character words can also be classified as characters even if they are words by themselves. (Am I right?) Anyways, learning words are much more important than learning characters by themselves. After all, if one just starts studying the 3000 most common characters, then he/she will eventually realize that he/she can still not communicate very effectively without words.

Edited by ABCinChina
Posted

True.

ABC you meant to say 3000 words, and 6800 characters right?

I only found 2018 rows in that spreadsheet.

Thanks anyway all.

a

Posted

I think he meant 6800 multi-character words, using around 3000 unique characters.

And there is more than one table, you were only looking at the 1st level. There are four.

  • 1 month later...
Posted

Okay, I had a little play with the spreadsheet: I wanted a list of all the characters you should recognise at each level. Some of the characters are "words" in their own right, in that they stand alone very easily. Others only exist (for HSK, at least) in a bound form with another character.

Stats:

A: 805

B: 798

C: 590

D: 669.

This means for example that to recognise all the words and characters for level C, that haven't already been seen in levels A and B, you need to learn 590 characters.

Three caveats:

1) I might have made mistakes (especially for A, which I believe should total 800).

2) Probably there are characters in, say, lists A B or C which only occur in "bound form" as part of a two-character "word", but which then appear in list D as a standalone character. I haven't taken this into account.

3) It may or may not be useful to learn certain characters on their own.

Fuller stats (which may be even more wrong of course) :

Unique Characters / Total entries / Of which single / Of which multiple / Bound only

A 805 / 1033 / 453 / 580 / 352

B 798 / 2018 / 559 / 1459 / 239

C 590 / 2002 / 441 / 1561 / 149

D 669 / 3571 / 457 / 3114 / 212

So it appears that for A, for example, there are 453 characters which stand alone, but a further 352 which only appear as part of a multi-character word.

Don't know how useful that is ... I just want to be able to work out how which characters & words to learn to "complete" the different HSK levels.

  • 2 weeks later...
Posted

Hehe, you could be right but it's always fun to check one's progress, see how much further to go, right?

Anyway, just a warning about the file that is linked to earlier, called HSK List.rar. It has quite a few mistakes in it: a few wrong tones, but also some completely wrong translations of words. The most recent one I found was: 报酬 translated as "revenge, avenge" but according to Wenlin 报酬 means "reward, renumeration, pay". It is in fact the similarly-sounding 报仇 which means "revenge, avenge".

I still think the list is really useful but I'd treat it with caution. I double check every definition.

I'd also like to know where it came from ... is it "official" in any way?

  • 3 months later...
Posted

Using the information and the list provided, I have created a macro that is helping me to learn HSK 1-2. Sharing with whoever . It is still in the infancy. I will be adding more stuff later.

Vocabulary_v1.xls

  • 1 month later...
Posted

does any one have a list of the most commonly SPOKEN words. I am just trying to learn enough to get by the frequency of word use in the spoken language is quite different from the written language.

  • 1 month later...
Posted

That's exactly the same list that realmayo is talking about (and which is available here on this site and a number of flashcard programs).

As far as I can tell, that's the single most reliable corpus for learners of Chinese out there. It's probably a good idea to learn all of that (pretty much all of it is important) and then get the spoken vocabulary from spoken materials (TV-shows, movies, radio, podcasts, etc.)

  • 4 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...