jlau Posted August 14, 2004 at 07:24 PM Report Posted August 14, 2004 at 07:24 PM Roddy: Someone mentioned flashcards earlier. Here's another option. I run http://www.yellowbridge.com, which includes a Chinese language section. I recently started offering free online flashcards based on the "Integrated Chinese" textbook series as well as a couple of lists of frequently used characters. It occurred to me that it would be relatively easy to create a set of flashcards based on the HSK word list you've already created. However, having done a fair amount of Chinese data entry myself, I know that it was a lot of effort to enter the list so I don't want to "borrow" your list without your permission. Please take a look at http://www.yellowbridge.com/language/flashcards.html to see my existing flashcards. Many students have found flashcards to be one of the few effective tols for memorizng Chinese words and characters. If you do decide that it is OK for me to use your list, I could extract the necessary information directly from your query tool so you wouldn't have to provide me with anything special. Proper attribution would be provided, of course. I indicated that it would be relatively easy but it is not without effort. A quick trial run using the HSK level 1 list indicated that about 40 entries (out of 1000) lacked corresponding entries in CEDICT. I would expect that the percentage of missing entries would be much higher in the higher levels. I would still need to manually track down the definitions of the missing words. Jaime Quote
mph Posted August 16, 2004 at 05:19 AM Report Posted August 16, 2004 at 05:19 AM Roddy: Thanks for the extremely useful resource. The Xian Dai Han Yu Cidian 2003 now seems to use only pinyin "yi1" exclusively for the character "一". In fact page 1483, of that august publication, under the head word yi2 一, refers the user to look under yi1. But 一定 in HSK is under pinyin "yi2 ding4". Xian Dai... uses "yi1 ding4". ABC also uses yi1 ding4 but mentions thst yi2 ding2 is pre ABC. The Xian Dai.. reasoning behind this is that the character 一 is only listed under the single pinyin yi1. It then follows that all compound words using 一, as the first character, should be under the yi1 pinyin headword. If this standard was not adhered to then one would have no end of trouble finding words as one would be unsure of the headword pinyin. Not sure if you intend to use Xian Dai ... as the standard but if you do then all the yi2 pin yin may need to be changed to yi1. Thanks again. Myles Quote
mph Posted August 18, 2004 at 01:32 AM Report Posted August 18, 2004 at 01:32 AM Roddy G'day... The database is now delivering the search outcome twice. For example with input .... zuo4feng1 HSK level3 list the outcome is 作风 zuo4feng1 (3) 作风 zuo4feng1 (3) Earlier on the outcome was only a single line 作风 zuo4feng1 (3). Otherwise it is all working fine. Regards Quote
roddy Posted August 18, 2004 at 12:40 PM Author Report Posted August 18, 2004 at 12:40 PM The entire database is now available in Comma Separated Variable format at http://www.chinese-forums.com/vocabulary/HSKcsv.csv (220KB) If anyone wants to use this for their own purposes, go ahead - I'd appreciate it if you link back to the forums and let me know what you are using it for though. The doubled results problem has been solved - a mistake I made last time I 'fixed' something. As for pinyin for 一, I just used what was on the HSK vocab list I was using - I'm not going to cross-reference various standards. Roddy Quote
jlau Posted August 27, 2004 at 07:02 PM Report Posted August 27, 2004 at 07:02 PM I have gone ahead and created online flashcards based on the HSK word lists (courtesy of Roddy, thanks). At present only Levels 1 and 2 are ready. You can access the flashcards at http://www.yellowbridge.com/language/flashcards.html. Feedback and comments are appreciated, of course. Question for Roddy. The csv file still contains a number of duplicates (not a lot). The ones with slightly different pronunciation, I understand. However, there are a few which appear to be truly duplicates. I was wondering whether this was the result of the same bug you mentioned earlier or whether there are other differences in the original source that wouldn't show in the file (such as different meanings or different traditional characters). Jaime Quote
roddy Posted August 28, 2004 at 12:35 AM Author Report Posted August 28, 2004 at 12:35 AM That's a result of me blindly copying the HSK word list - for example, in the list I have, 把 is in twice, as 介词 and 量词 - so in that case it's because there are different meanings. However, in other cases there might be only one entry which is classed as 名, 动 and I've only put it in once. Roddy Quote
mph Posted September 7, 2004 at 02:45 AM Report Posted September 7, 2004 at 02:45 AM Roddy G'Day... Not sure what is happening but I did a search with for 办* on the web page. It did not return 办公室 which is level 1 in your csv list. 各* does not bring up 各种 from level 1 either??? 帮* does not bring up 帮助,bāngzhù,bang1zhu4,1 either etc. etc. Also why the ban4gong1shi4 seems to be so popular these days I think pinyin in the csv list has the 公 as a third tone. Another pinyin worth look at is 打算,da3suan4,1 SDHYCD has that listed as da3suan5 Similarly these pys differ from SDHYCD 关系,guan1xi,1.................... guan4 后边, hou4bian1,1.............. bian5 里边, li3bian1,1 ............... bian5 哪里, na3li3,1........................li5 痛快, tong4kuai,1............... kuai4 I have mentioned this before but the entries from 一定,yídìng,yi2ding4,1 down to 一直,yìzhí,yi4zhi2,1 have different piyins from SDHYCD The following csv line could also do with a tweak.... 歌,gēge,ge1ge,1 mph Quote
Mayu Posted September 21, 2004 at 07:56 AM Report Posted September 21, 2004 at 07:56 AM Hi all, I have seen this site http://www.unige.ch/lettres/meslo/chinois/hsk.html which lists 800 characters for HSK level 1. Since Roddy lists 369 chars, do you know where the difference may come from? Mayu Quote
Sortaz Posted November 23, 2004 at 12:45 PM Report Posted November 23, 2004 at 12:45 PM All entries for HSK Level 1 seems to be missing. Or am I doing something wrong? Quote
roddy Posted November 23, 2004 at 01:59 PM Author Report Posted November 23, 2004 at 01:59 PM No I have no idea how that happened. Will get it fixed in the very near future. Roddy Quote
geek_frappa Posted November 23, 2004 at 05:21 PM Report Posted November 23, 2004 at 05:21 PM Please take a look at http://www.yellowbridge.com/language/flashcards.html to see my existing flashcards. Many students have found flashcards to be one of the few effective tols for memorizng Chinese words and characters. hmm.. very nice. Quote
trevelyan Posted December 8, 2004 at 06:43 PM Report Posted December 8, 2004 at 06:43 PM Ended up looking at the pinyin for Roddy's HSK list tonight. Am curious about the following characters whose pinyin representations seem uncommon to me. I'm probably wrong, but thought I'd flag them just in case. Most are duoyinci, in which case its an issue of which is more common. Anyone care to comment? yan4 咽 (yan1) zhao2 着 (zhe3) (zhao1 for 着急?) ben4 奔 (ben1) he1 呵 (he5) jia4 假 (jia3) ning3 拧 (ning2) ying4 应 (ying1) cheng4 秤 (chen4) ding4 钉 (ding1) fan4 泛 (fa2) feng4 缝 (feng2) heng4 横 (heng2) huo1 豁 (huo4) juan4 圈 (quan1) nan4 难 (nan2) tiao3 挑 (tiao1) Also noticed this entry with odd formatting for the pinyin column: 4-Jun 俊 (jun4) Quote
roddy Posted December 18, 2004 at 10:11 AM Author Report Posted December 18, 2004 at 10:11 AM Ok, after a brief hiatus of 6 months or so, I finally got back to this . . . Todays Changes: 1) You now have English for many of the entries. For this you should be thanking our very own Trevelyan, who ran the list through the magic machine which powers his Adsotrans page (check out the very useful webpage annotation function). Should have done this ages ago, wasn't difficult at all (especially as someone else did the difficult bit for me). Caveats are that the English entries are neither complete nor perfect. Some are missing, and in other cases entries which should have different English translations actually have both (ie the 2 entries for 应(1st tone and 4th tone) should be 'should' and 'answer, respond' respectively. Actually, they are both 'should, answer, respond'). If you want to suggest changes / improvements to the English, please do so via Adsotrans and when that's updated I'll import the new entries - I'm nto going to edit the English directly (I think) 2) Number of errors and typos corrected, including some of the above. 3) In addition to list and card output, you now also have the option for a CSV (comma separated variable) file. This can be opened in Excel, many database programs and also some flashcard programs such as Supermemo. This does not happen as a .csv file, you'll get an html page of text you can copy and paste. 4) I put back the 2000 entries I forgot to import last time I updated the database Next plans are - word class information, and searching on the English entries. Suggestions on how this can be made more useful are welcome, and I'll see what I can do (over the course of the next decade ) Roddy Tiny Edit: 连续 is misplaced, and depending on how you order things may not appear where you expect it to in a list. However it is there. Quote
mandarinboy Posted January 7, 2005 at 12:48 PM Report Posted January 7, 2005 at 12:48 PM First, it is a wounderful resource you have! Since I will go up and take my test soon i was using the list and noticed a few odd things. To make sure it was odd and not just me and my stupid brain i run a test against it. I took your list and converted into a database and run this query against it: select hsk_out.hsk_level,hsk_out.pk,hsk_out.chinese,hsk_out.pinyin,hsk_out.translation1 from hsk_out inner join (select chinese, pinyin,translation1 from hsk_out group by chinese, pinyin,translation1 having count(chinese) > 1 ) as duplicates on hsk_out.chinese = duplicates.chinese and hsk_out.pinyin = duplicates.pinyin and hsk_out.translation1 = duplicates.translation1 The result will be 290 lines = aprox 100+ dupplicates. This i can fix for you in no time if you want my help. Another strange thing, or it might be just me not knowing better, is that characters like one(yi1) appears to be in both level 1,2 and 4. Can it be so? Finaly, if you want help with adding english definitions to the missing ones i can help you with that too. Still machine analyzed but i found all missing translations. I can also contribute with links to stroke order animations for many of the characters and direct link to zhongwen.com for genealogy. I am still moving my old office to a new one after a fire but i will try to find some time to go through this carefuly and give a list in any format you like when i am done. If you let me know the format you prefere i will create such a file for you. Finaly, a big, big thanks for this list. It have helped me a lot Quote
mandarinboy Posted January 8, 2005 at 10:23 AM Report Posted January 8, 2005 at 10:23 AM This night i run a few more test against the list and added traditional characters, many more translations and variantions of translations and added tone marks to pinyin. The list is now striped on duplicates. I run it against the adso dtabase as well to update it with word classes and against Unihan to get correct pinyin. Let me know if you want the list. Now i am looking on adding measure words and other stuff. Quote
roddy Posted January 8, 2005 at 10:30 AM Author Report Posted January 8, 2005 at 10:30 AM Mandarinboy - Could you give me some examples of 'duplicates'? As far as I am aware there are none - there are however cases where the same character is in two or more levels with different meanings / pronunciation - but I don't consider this a duplicate. Also, the English is often duplicated, but as I said re the English Some are missing, and in other cases entries which should have different English translations actually have both I also already have tone marks with pinyin - when you run the search you can choose if you want the marks or the numbers. Word class information from Adso I also already have, I just haven't integrated it into the search function. Any improvements to the English, I would suggest you send them to ADSO, and then I'll import them from there. To be honest I'm not sure how much time I'm going to have to work on this over the next few months - if you want to do stuff for your own benefit that's great, but I can't guarantee I'll be able to use any of it. Many thanks for your interest Roddy Quote
mandarinboy Posted January 10, 2005 at 09:07 AM Report Posted January 10, 2005 at 09:07 AM E.g 把 ( two entries in level 1 and one more in level 3 ) 白 ( one entry in level 1 and one in level 2 ) 当 ( two in level 1 and one in level 2 and one in level 3 点 ( 3 times in level 1 ) and so on. There are around 150 duplicates like those, if i am right. As i an see they have the same pinyin and the same meanings. The result i have from your page: http://www.chinese-forums.com/vocabulary/ If you need some help, just let me know and i can clean them up for you and add additional functionality. For me they are so extremely helpful and i am very, very happy that you have taken your time to put up this woulderful tool. It have been so helpful for me in the past weeks. Quote
roddy Posted January 10, 2005 at 10:15 AM Author Report Posted January 10, 2005 at 10:15 AM I see what you are referring to - however, if I check my HSK word list, I find 把 - level one has two entries, once as preposition (把啤酒给我) and once as measure word (一把钥匙) and then in level 3 as a verb. With 点,it is in level 1 three times - as a measure word, a noun and a verb. I don't have time to check the others, but I'm pretty sure there'll be a similar explanation. What's happening is that as I don't have part of speech info, and the English translations are not tailored to the HSK, but come out of ADSO, these might look like duplicates - but I'm confident that for each apparent duplicate there is something that will distinguish them. Thinking about it, I'm not sure of the logic behind the HSK wordlists. For example, as I mention 点 has separate entries for it's noun and verb forms. However, 病 has only one and is listed as (动, 名). Regardless of the reason, it makes this easier for me if I just follow the HSK lists - that way, if there are 8822 entries in the complete lists, I know there should be 8822 entries in my database. I'm really glad to hear you are finding the tool useful. I'd be interested to know exactly how you are using it, particularly if you are making use of any of the more advanced 'fuzzy' search options. Roddy Quote
djwebb2004 Posted January 14, 2005 at 08:22 PM Report Posted January 14, 2005 at 08:22 PM Roddy, I have read the forum, and i know you already know about ban4gong1shi4 instead of ban4gong3shi4. I have read through the first 400 words, and I found these also: #86. Cheng2ji4, not Cheng2ji1. #106: Ci2 not Ci2dai4 #245: Ge1 not Ge1ge #279: Guo4qu not Guo4qu4 #317: Huan2 not Huan4 #318: Huan4 not Huan2 I hope this helps. Quote
roddy Posted January 21, 2005 at 12:14 AM Author Report Posted January 21, 2005 at 12:14 AM Thanks, I'll look into those. Were you working from the .csv file, or the search function? I have a feeling I fixed some of those on the database, but didn't update the .csv. Roddy Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.