chasebrammer Posted November 25, 2007 at 04:45 PM Report Posted November 25, 2007 at 04:45 PM I have been through the first 3000 common characters, and want to move on the the next 2000 characters. I have a list of the next set of characters I want to hit, but that is it, no definitions. I REALLY don't want to go through and get the definitions by hand. Does anyone know of a way I can do a batch process and get all the definitions at once? Quote
roddy Posted November 26, 2007 at 07:57 AM Report Posted November 26, 2007 at 07:57 AM Best free option I can think of would be to use the unihan database, which you can get via the link here. Although you'd need to be able to set up database queries to do that easily I think. It's doable though, I had it installed on a local mysql server at one point. Would be fairly simple to do, if you know mysql. Out of curiousity, where are you getting the character lists from? Quote
chasebrammer Posted November 26, 2007 at 09:55 AM Author Report Posted November 26, 2007 at 09:55 AM Roddy - thanks for the reply. I am a web developer by trade so this is even better than what I was expecting. About the list, I got it here http://technology.chtsai.org/charfreq/94charfreq.html Quote
choisum Posted November 26, 2007 at 10:00 AM Report Posted November 26, 2007 at 10:00 AM > Would be fairly simple to do, if you know mysql. If it's a one-off job and you don't have a database handy, the Linux/cygwin join command (join --help) can do inner and outer joins on files. Quote
roddy Posted November 26, 2007 at 10:01 AM Report Posted November 26, 2007 at 10:01 AM That's probably your best bet then. Might be worth putting the lists on here (or making available elsewhere) once generated if it's no trouble - someone might find them useful. Quote
chasebrammer Posted November 26, 2007 at 03:55 PM Author Report Posted November 26, 2007 at 03:55 PM Took some time tonight and joined the two databases I have. It was a little more complex than I thought it would be because there are multiple pronunciations for characters, so I had to do some error proofing. Anyway, I have exported and attached it as an Excel 98-03 file so that it is easy to view. ENJOY! ps, it is actually the top 13,000 most used but because the dictionary doesn't have definitions for all of them there are some holes in the list. ranked.zip Quote
renzhe Posted November 27, 2007 at 12:19 PM Report Posted November 27, 2007 at 12:19 PM What I did in a similar situation was to download the Unicode version of CEDICT and use Python scripts to process each character in turn, but looks like you have it sorted out already. Quote
bernard_nj Posted March 25, 2008 at 02:37 PM Report Posted March 25, 2008 at 02:37 PM I downloaded your ranked.zip file and appreciate your effort. I compared it to a list of words in the Integrated Chinese Part 1 textbook, and found four popular words that were missing from your list. They are nu35 female, woman 女, lu45 law 律, lu45 green 綠 , li3 inside, within 裏. These words have ranking numbers from the YellowBridge.com site as follows: 299,683, 1169, 8808 (for simplified character). I have done some matching like this and perhaps some cases were not found by your formulas. Your list has 9,127 characters in it. Bernard from New Jersey Quote
bernard_nj Posted March 25, 2008 at 03:30 PM Report Posted March 25, 2008 at 03:30 PM chasebrammer, When I opened your XLS there were 14 cells with no English. My version of Excel converted them to #NAME?. I list the 14 below, and also put in my best guess as to the proper English. On some of them, the initial hyphen may be misunderstood by Excel. On these I started the text in the cell with a single quote mark. I would be glad to send you the corrected XLS if you send me your email address. Thanks again for your very helpful work. Bernard from New Jersey 26 學 学 xué #NAME? 70 家 家 jiā #NAME? 84 無 无 wú #NAME? 85 然 然 rán #NAME? 95 最 最 zuì #NAME? 317 化 化 huà #NAME? 1025 唯 唯 wéi #NAME? 1802 炎 炎 yán #NAME? 2220 惟 惟 wéi #NAME? 3204 嗡 嗡 wēng #NAME? 3313 鍍 镀 dù #NAME? 7717 啶 啶 dìng #NAME? 10717 鋈 鋈 wù #NAME? 10935 唑 唑 zuò #NAME? 學 学 xué study 家 - jiā house, home, residence; family 最 - zuì most, extremely, exceedingly; -est 然 - rán correct; right; so; thus; like this; -ly 化 - huà to make into; to change into; -ization; to 唑 - zuò (phonetic); -z + ole (chem.) 唯 - wéi only; yes 啶 - dìng -d + ine (chem.) 嗡 - wēng sound of flying bees, airplanes; -oin (chem.) as in anisoin 惟 - wéi but, however, nevertheless; only; -ism 炎 - yán flame; hot; inflammation; -itis 無 无 wú negative, no, none, not; lack, have no; -less; un- 鋈 - wù silver plating, -plated 鍍 镀 dù plate, coat, gild, -plated Quote
roddy Posted March 25, 2008 at 03:51 PM Report Posted March 25, 2008 at 03:51 PM You can, if you wish, upload the file here. If it's over 1MB you may need to zip it first.Look for the 'manage attachments' button when making a new post. Quote
bernard_nj Posted March 25, 2008 at 08:46 PM Report Posted March 25, 2008 at 08:46 PM Roddy, Here is the XLS where I matched your list of 9000 against the words in the Integrated Chinese Part 1. There are two sheets. One is your list and the other is mine from the text. In spreadsheet databases I put the formulas for the column in row 3. Then I paste values down the column to save memory. The Ranked sheet shows the words that had errors for English. You can use a sort button to put the rows in different orders. The IC sheet shows the four words from the text that are not in your list. Have fun, as it sounds like you are an Excel-ophile like I am. Bernard from New Jersey RankedList_Mar25_2008_0440pm.zip Quote
jychina Posted January 21, 2009 at 08:36 AM Report Posted January 21, 2009 at 08:36 AM Bernard, i know my way around excel pretty well and understand what the functions do, but can you tell us what you were trying to do and what does "is class" mean? what is purpose and how would one use this excel file in language study? also, any program out there that can take these as input and generate flashcards on the computer? thanks Quote
bernard_nj Posted January 21, 2009 at 05:34 PM Report Posted January 21, 2009 at 05:34 PM jychina, I am taking an adult class using the Integrated Chinese text. As I do each chapter, I enter all the words in the vocabulary lists, and in the grammar and notes sections. I also input pinyin, number of strokes, and ranking of the word (from chinesepod.com). I include all words that have been in prior chapters, so I can see the components of compound words. The field Is_Class indicates which words are from the text, and which words are from some other source. For example, I recently visited Beijing and added 100 new words from street signs, subway stops, store signs, etc. Our class also used a Business text. Plus we had special lists for New Years Day. All these other words would have IS_Class set to no. In this way I can prepare lists for review of the text words, and other lists for the non-text words. I have prepared two sheets that are fed automatically from my very long list of 4000 rows. I have one sheet that gives me the words in the chapter in very large size, with 9 to a page. I use this for really seeing how to write the character. They are like flash cards. When I first started learning I wrote out my own flash cards. After a few months I stopped, and then used Excel to print my FlashCard sheets. At first I cut them into individual cards. Now I just study directly from each sheet with 9 words. I also have a list for each chapter with about 40 words per sheet. This allows me to review the words in a chapter quickly. Both of these study aids are small, so I can take them with me and use them whenever I have spare time. The textbook is too large to carry around. We work on one chapter for three weeks. I have found these study aids very useful to me. I also have the ability to do the sheets and exclude words from prior chapters. Duplicate words are flagged with IsDuplChar set to YES. I can also create other long word lists. I can list all the characters sorted by English, Pinyin, or hanzi. I carry these with me to help me find words I have learned int he past, but I have forgotten. These study aids let me spend a lot of free time looking at a few characters at a time. If you would like to see some of the chapter lists, send me your email ID. Bernard New Jersey, US Quote
yersi Posted January 21, 2009 at 09:55 PM Report Posted January 21, 2009 at 09:55 PM Is there any way to import this excel file into Anki? Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.