Jump to content
Chinese-Forums
  • Sign Up

5000 Common Character Cards - Batch Definition Generator


Recommended Posts

Posted

I have been through the first 3000 common characters, and want to move on the the next 2000 characters. I have a list of the next set of characters I want to hit, but that is it, no definitions. I REALLY don't want to go through and get the definitions by hand. Does anyone know of a way I can do a batch process and get all the definitions at once?

Posted

Best free option I can think of would be to use the unihan database, which you can get via the link here. Although you'd need to be able to set up database queries to do that easily I think. It's doable though, I had it installed on a local mysql server at one point.

Would be fairly simple to do, if you know mysql.

Out of curiousity, where are you getting the character lists from?

Posted

> Would be fairly simple to do, if you know mysql.

If it's a one-off job and you don't have a database handy, the Linux/cygwin join command (join --help) can do inner and outer joins on files.

Posted

That's probably your best bet then. Might be worth putting the lists on here (or making available elsewhere) once generated if it's no trouble - someone might find them useful.

Posted

Took some time tonight and joined the two databases I have. It was a little more complex than I thought it would be because there are multiple pronunciations for characters, so I had to do some error proofing. Anyway, I have exported and attached it as an Excel 98-03 file so that it is easy to view.

ENJOY!

ps, it is actually the top 13,000 most used but because the dictionary doesn't have definitions for all of them there are some holes in the list.

ranked.zip

Posted

What I did in a similar situation was to download the Unicode version of CEDICT and use Python scripts to process each character in turn, but looks like you have it sorted out already.

  • 3 months later...
Posted

I downloaded your ranked.zip file and appreciate your effort. I compared it to a list of words in the Integrated Chinese Part 1 textbook, and found four popular words that were missing from your list.

They are nu35 female, woman 女, lu45 law 律, lu45 green 綠

, li3 inside, within 裏. These words have ranking numbers from the YellowBridge.com site as follows: 299,683, 1169, 8808 (for simplified character).

I have done some matching like this and perhaps some cases were not found by your formulas. Your list has 9,127 characters in it.

Bernard from New Jersey

Posted

chasebrammer,

When I opened your XLS there were 14 cells with no English. My version of Excel converted them to #NAME?.

I list the 14 below, and also put in my best guess as to the proper English.

On some of them, the initial hyphen may be misunderstood by Excel. On these I started the text in the cell with a single quote mark.

I would be glad to send you the corrected XLS if you send me your email address.

Thanks again for your very helpful work.

Bernard from New Jersey

26 學 学 xué #NAME?

70 家 家 jiā #NAME?

84 無 无 wú #NAME?

85 然 然 rán #NAME?

95 最 最 zuì #NAME?

317 化 化 huà #NAME?

1025 唯 唯 wéi #NAME?

1802 炎 炎 yán #NAME?

2220 惟 惟 wéi #NAME?

3204 嗡 嗡 wēng #NAME?

3313 鍍 镀 dù #NAME?

7717 啶 啶 dìng #NAME?

10717 鋈 鋈 wù #NAME?

10935 唑 唑 zuò #NAME?

學 学 xué study

家 - jiā house, home, residence; family

最 - zuì most, extremely, exceedingly; -est

然 - rán correct; right; so; thus; like this; -ly

化 - huà to make into; to change into; -ization; to

唑 - zuò (phonetic); -z + ole (chem.)

唯 - wéi only; yes

啶 - dìng -d + ine (chem.)

嗡 - wēng sound of flying bees, airplanes; -oin (chem.) as in anisoin

惟 - wéi but, however, nevertheless; only; -ism

炎 - yán flame; hot; inflammation; -itis

無 无 wú negative, no, none, not; lack, have no; -less; un-

鋈 - wù silver plating, -plated

鍍 镀 dù plate, coat, gild, -plated

Posted

You can, if you wish, upload the file here. If it's over 1MB you may need to zip it first.Look for the 'manage attachments' button when making a new post.

Posted

Roddy,

Here is the XLS where I matched your list of 9000 against the words in the Integrated Chinese Part 1. There are two sheets. One is your list and the other is mine from the text.

In spreadsheet databases I put the formulas for the column in row 3. Then I paste values down the column to save memory.

The Ranked sheet shows the words that had errors for English. You can use a sort button to put the rows in different orders.

The IC sheet shows the four words from the text that are not in your list.

Have fun, as it sounds like you are an Excel-ophile like I am.

Bernard from New Jersey

RankedList_Mar25_2008_0440pm.zip

  • 9 months later...
Posted

Bernard,

i know my way around excel pretty well and understand what the functions do, but can you tell us what you were trying to do and what does "is class" mean? what is purpose and how would one use this excel file in language study?

also, any program out there that can take these as input and generate flashcards on the computer?

thanks

Posted

jychina,

I am taking an adult class using the Integrated Chinese text. As I do each chapter, I enter all the words in the vocabulary lists, and in the grammar and notes sections. I also input pinyin, number of strokes, and ranking of the word (from chinesepod.com).

I include all words that have been in prior chapters, so I can see the components of compound words. The field Is_Class indicates which words are from the text, and which words are from some other source. For example, I recently visited Beijing and added 100 new words from street signs, subway stops, store signs, etc. Our class also used a Business text. Plus we had special lists for New Years Day. All these other words would have IS_Class set to no.

In this way I can prepare lists for review of the text words, and other lists for the non-text words.

I have prepared two sheets that are fed automatically from my very long list of 4000 rows. I have one sheet that gives me the words in the chapter in very large size, with 9 to a page. I use this for really seeing how to write the character. They are like flash cards.

When I first started learning I wrote out my own flash cards. After a few months I stopped, and then used Excel to print my FlashCard sheets. At first I cut them into individual cards. Now I just study directly from each sheet with 9 words.

I also have a list for each chapter with about 40 words per sheet. This allows me to review the words in a chapter quickly.

Both of these study aids are small, so I can take them with me and use them whenever I have spare time. The textbook is too large to carry around. We work on one chapter for three weeks. I have found these study aids very useful to me.

I also have the ability to do the sheets and exclude words from prior chapters. Duplicate words are flagged with IsDuplChar set to YES.

I can also create other long word lists. I can list all the characters sorted by English, Pinyin, or hanzi. I carry these with me to help me find words I have learned int he past, but I have forgotten.

These study aids let me spend a lot of free time looking at a few characters at a time.

If you would like to see some of the chapter lists, send me your email ID.

Bernard

New Jersey, US

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...