Jump to content
Chinese-Forums
  • Sign Up

HSK *character* lists


Recommended Posts

Posted

Has anyone out there got lists of characters arranged by HSK level? There are a number of word lists out there, including this engine I put up ages ago, and HSKflashcards.com, but I never got around to doing full *character* lists and a quick poke around on the Internet (I got as far as page two of the Google results :)) didn't turn anything up. Xiaoma looks like it does, but its character / word lists are actually single character words / non-single character words.

I'm trying to learn to write. With a PEN!

I have lists I've generated myself from the word lists - it's a fairly simple text processing task - but the number of characters is a bit off from what's in the HSK literature for some reason and they're devoid of any pinyin or meaning info.

Posted

Why not a pencil :mrgreen:

I'm not quite understanding what you are looking for....it seems like xiaoma is pretty complete in how it does it....

Posted

I'm looking for a list of all 字 used in the HSK 词 for any particular level, for the purpose of learning to write. I'm not worrying about words at the moment, just characters. The list of 字, when learned, should allow you to write any 词. Xiaoma's listings (while very useful) are a bit misleading as they've misunderstood what characters / words mean here. Example. 安静 is an HSK level 1 word, and it's in Xiaoma's word list. Fine. But neither 安 or 静 appear in the character list, as the character list is in fact a list of single-character words, which is something entirely different. HSK level 1 vocabulary uses 800 characters, according to the book I have. Xiaoma lists 400+.

I've got lists I've generated myself up to level 3 and will post them once I've got 4 as well. It just occurred to me that maybe someone else had done this already, better. These lists already exist in HSK literature, just not sure if there's an electronic version somewhere.

Why not a pencil

Don't want to push myself too hard. One step at a time . . .

Posted

Okay I'm following now....

Don't know anything myself but on the same subject what the heck is up with the pleco flashcards for HSK.... I don't understand their methodology in the least...

Posted

What's the problem with them? They were generated from my lists (which included English and part of speech info from Adso) so I might be able to explain.

Posted

Really??? I wonder if I somehow got a corrupted file then because there was no english translation and it seemed a lot more like a list of random obscrete characters (even on the lower level) which kind of bummed me out that i didn't know a majority of the level one character after I tested an 8 on the 初中...

I'll try to reinstall and see what comes up...

The online thing looks nothing like what is on my palm...

Posted

I'm not sure if the English was included in the pleco lists - probably not, as when you import it should automatically find the English from the dictionary, which is going to be much preferable. Sounds to me like you had an encoding issue.

Posted

Just redid it and it works fine and looks nothing like it did before.... whatever... Thanks!

Posted

How are you putting the lists together? Do you also need the meaning/pinyin for each 字? If not, export your HSK lists to CSV, open in excel and copy the column with just 字, and then paste them into this site, which will generate a list of the individual characters used and as a bonus arrange them by character frequency, which should help you prioritise (note however that this frequency only refers to the frequency they appear in the words you provided, and not overall frequency of the character in the Chinese language).

If you want the characters arranged by HSK level, you'll have to do 1 level at a time (due to the rearranging by frequency), but it shouldn't take more than a couple of minutes to convert them all.

Posted

Ah, that would have saved me some time . . .

I did custom php scripts to a) extract all characters from a list of words and B) compare and display the difference between two lists of characters. That let me generate by-level lists of characters.

Meaning / pinyin info would be very useful, esp for duoyinzi, but it's not essential.

Actually (without even looking at the site you linked) I'm not sure that would work - by level 4 you'd end up with all the characters, not the extra characters that weren't in the lower level? You need the extra comparison stage.

Posted
by level 4 you'd end up with all the characters, not the extra characters that weren't in the lower level?
Think of it as revision :-)
Posted

Sod that, I haven't learned it first time round yet. No need for excessive revision. . .

What I'm thinking might be useful is some kind of flashcard listing like

字 / pronunciation / list of words containing said 字

ie

安 / an1 / 安静,安排,安慰 。 。 。

多音字 are an issue, esp. as when you import a character into pleco it automatically assigns the first pronunciation it finds rather than the most common.

Posted

You really got me interested now.

I've used the "character" lists from xiaoma cidian, and the word lists from your search engine. I've generated flashcards for KVocTrain/Parley and Mnemosyne using these lists (which I've shamelessly plugged in the "Best of Chinese Study Tools" thread), through a couple of Python scripts I wrote, and with the automatic use of CEDICT.

And I have the same problem you have: Although I know most of the characters in those "character" lists (at least all in HSK1-HSK3, which is close to 1500 -- still working on HSK4), I still run into characters I don't know in the word lists.

Yet, I don't think that they are single-character WORDS, at least not in all cases. At the very least, the phonetic characters aren't words per se.

Am I correct in understanding that there is no official character list floating around the internet? In that case, I'd be very interested in obtaining such a list, and would be willing to help if I can.

Posted

As far as I can see there isn't, but generating one is trivial if you know how to process Chinese text. I have a bunch of god-awful php scripts which let me a) generate a list of characters that appear in any given input and B) subtract characters in list A from list B. That's all you need, plus the available word lists. I also put together some pretty crappy scripts explained here which allow you to generate words (within the scope of the HSK lists) from characters - ie if you have a vocabulary way ahead of your character writing, you can find out what relatively advanced words you can write with your X hundred characters.

Although I know most of the characters in those "character" lists (at least all in HSK1-HSK3, which is close to 1500

Should be over 2000. See here for a breakdown I did earlier.

The tiresome bit would be adding value in terms of pronunciation (paying particular attention to 多音字 and 多音字 across levels - does an elementary student need to know if a certain word has a different, but rare, pronunciation they won't come across till they reach advanced level), meaning, compound info.

Yet, I don't think that they are single-character WORDS, at least not in all cases. At the very least, the phonetic characters aren't words per se.

I can't see what else they've done. Another example is 哀悼 in the level 4 word list - 悼 doesn't appear to be in any of the character lists. I haven't gone over it character by character, but their 'character lists' are definitely not what I would expect a character list to be.

I'm attaching a pleco-format flashcard list of characters for the first three HSK levels. I'm not guaranteeing completeness or accuracy. Most importantly, the pinyin for 多音字 may be for a less common or even obscure pronunciation.

Edit: Scratch that. Realized I'd inadvertently added a bunch of stuff that shouldn't be there. Will repost when I have clean lists.

Posted

Yeah, I have similar scripts in Python, also for automatically looking characters and words in CEDICT. I imagine there are other people out there with their own solutions, it seems like a natural thing to do.

Should be over 2000. See here for a breakdown I did earlier.

SHOULD be, if you count all characters appearing in all the phrases, but as I'm only using the single-character-phrases as my "character" input (or whatever hmarty is using on his dict), it comes to about 1300. I don't know all the multi-character words in HSK2 and HSK3.

Overall, I know about 1700 indivicual characters, still working hard on it. Hope to hit 2000 by the end of the year.

As for 多音字, I get all the pronunciations, then look at them by hand and try to pick the most commonly used one (by analysing how many words it appears in). Usually, only one reading is very common.

This, however, is extremely tedious and not the ideal solution.

Posted

roddy and others, do you still need these lists?

I have generated the lists in tab-separated unicode format, together with pinyin and translation, for all four HSK levels.

Posted

Sure, I could certainly use them, and if you can make them available no doubt plenty of other folk will also take a look. You should be able to attach them to a post (look for the manage attachments button)

Posted

OK, let's try this.

I'm submitting the character lists and the word (multi-character) lists for HSK1-4, with all the definitions.

So far, I did not remove the characters which appear in lower levels, like you were considering. I can do it if there is interest, but I think that they have some value like this as well, because you can drill the characters for a certain level, especially if you want to learn words from that level later.

This may not be perfect, and any feedback is very welcome, of course. The definitions are from CEDICT, but limited to the three most common. With the multi-tone characters I tried to pick the one tone actually appearing in the relevant word list, barring that, I tried to pick the most common one. Errors here are also possible.

You may need a Unicode font.

hsktables.zip

Posted

And thank you for that, as it helped me produce the attachment. I'm not sure if this will be of any use to anyone else - I'm not yet convinced it's going to be of any use to me - but it's basically a list of all HSK characters and words in the following format

字 tab ~A;~B;C~;D~E

A sample:

稳 ~;~定;安~;平~;~当;~妥;

详 ~细;安~;

置 布~;位~;装~;安~;处~;设~;~;

按 ~;~时;~照;~期;~劳分配;编者~;

flash.txt

Posted

Nice list....holy cow...

Now I just have to figure out how to put it to use....

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...