大块头 Posted March 31, 2021 at 04:39 PM Report Posted March 31, 2021 at 04:39 PM I extracted all the columns and removed the watermarks. I'll probably have time later today to OCR everything. GitHub repo with the images 3 Quote
Moshen Posted March 31, 2021 at 05:00 PM Report Posted March 31, 2021 at 05:00 PM So is this 3.0 revision changing the existing levels or only adding three additional higher levels? 1 Quote
mikelove Posted March 31, 2021 at 05:02 PM Report Posted March 31, 2021 at 05:02 PM Here's an OCR'ed version - just added a new "OCR entire PDF" function to Pleco 4.0 specifically for this project ? https://github.com/elkmovie/hsk40 3 Quote
大块头 Posted March 31, 2021 at 05:24 PM Report Posted March 31, 2021 at 05:24 PM @mikelove Did you mean to type "HSK 3.0"? I think you must have the number 4 on your mind for some reason. Quote # HSK 4.0 word list # OCR'ed but not extensively proofread (yet) # Copyright (c) 2021 Pleco Inc. # See https://github.com/elkmovie/hsk40/ for details + license 一级词汇表 1 爱 2 爱好 3 八 4 爸爸|爸 Quote
Jan Finster Posted March 31, 2021 at 05:32 PM Report Posted March 31, 2021 at 05:32 PM Running the list Mikelove just provided against the list 大块头 provided earlier last year as a reference, here are the 110 words my CTA marks as unknown: 不客气 服务员 填空 预习 赢 胃口 时尚 造成 木头 志气 溅 伯母 原告 近来 附件 纵然 正能量 肇事 术 镦 余额 赢家 人人 疫苗 自得 然 恰 一一 摇晃 下功夫 洗涤剂 物流 微型 团伙 同人 条例 搜查 双赢 宙 人工智能 阏 倾 诉 配送 鑾 农民工 理科 啦啦队 红薯 工科 高尔夫球 责任感 发愤图强 俄语 得意扬扬 大数 串门 厂家 伯父 摆放 看作 当作 仓 消费者 微博 毛笔 火腿 懈 二维码 赢得 微信 耕 期中 期末 模 多样 四级 总 者 积极性 可乐 现代化 好好 短裤 三级 音节 叹 外卖 生词 忸 那时候 叫作 多云 读音 二级 有时候 一些 刽 听写 恹 妈 姐 哥 第二 弟 爸 点儿 涞 助 介 1 Quote
mikelove Posted March 31, 2021 at 05:32 PM Report Posted March 31, 2021 at 05:32 PM 9 minutes ago, 大块头 said: Did you mean to type "HSK 3.0"? I think you must have the number 4 on your mind for some reason. Eep, yeah, duly corrected - thanks. 1 Quote
mikelove Posted March 31, 2021 at 05:40 PM Report Posted March 31, 2021 at 05:40 PM Uploaded an importable Pleco version of these to https://plecoforums.com/threads/hsk-3-0-flashcards.6706/ if anybody wants them that way. 1 Quote
mikelove Posted March 31, 2021 at 05:48 PM Report Posted March 31, 2021 at 05:48 PM Also, here's a list of the 206 words from this that don't appear to have entries in CC-CEDICT (as of the latest version of it in Pleco, which is a couple of months old). (updated, see below) Quote
roddy Posted March 31, 2021 at 06:03 PM Report Posted March 31, 2021 at 06:03 PM Some of those are (or look to be) OCR errors, before anyone adds them to CC_CEDICT - 起涞, for example. Quote
mikelove Posted March 31, 2021 at 06:19 PM Report Posted March 31, 2021 at 06:19 PM Thanks - I rechecked the ~50 words in this that also weren't in PLC, ABC, or Oxford and removed about a dozen that were OCR errors, here's an updated list. (also fixed on Github) Hsk3-missingcc-v2.txt 2 Quote
calculatrix Posted April 1, 2021 at 07:30 AM Report Posted April 1, 2021 at 07:30 AM 19 hours ago, roddy said: Oh, interesting they specify characters per minute for listening speeds. Up to a max of 800. And also for reading. Wow!!! Really??? Who can talk that fast? That is not conversation, that is olympic. 800 syllables per minute is 13 per second. "One missisippi two missisippi three missi" I cannot. Quote
Guest realmayo Posted April 1, 2021 at 08:24 AM Report Posted April 1, 2021 at 08:24 AM Yeah, that must be the length of audio, rather than the speed which doesn't seem - for the 高级 - to be stipulated. But lower levels do have speeds, reaching 220-240 字/分钟 for HSK6. For reading, HSK6 wants a minimum reading speed of 180 字/分钟, and it's 240 by the time you get to HSK9. Quote
mungouk Posted April 1, 2021 at 08:28 AM Report Posted April 1, 2021 at 08:28 AM Is there any software that can measure, and then help you to improve, your reading speed? Quote
Guest realmayo Posted April 1, 2021 at 08:35 AM Report Posted April 1, 2021 at 08:35 AM Not sure you need anything more than a stopwatch and a suitable text whose wordcount you know or can quickly find out. Oh, and a calculator! Not trying to be glib - I just don't think it's all that difficult to measure reading speed, and improving it seems to be a case of just forcing yourself to try to read a little quicker than last time. But happy to be proved wrong if there is some really useful software out there cos I really need to read faster than I do. Quote
Mantou Posted April 1, 2021 at 08:42 AM Report Posted April 1, 2021 at 08:42 AM Is there any information on how long the current HSK certificates will be valid? Quote
roddy Posted April 1, 2021 at 09:01 AM Report Posted April 1, 2021 at 09:01 AM 1 hour ago, calculatrix said: Wow!!! Really??? Who can talk that fast? That is not conversation, that is olympic Ha, that'll teach me to skim read and not think critically... Quote
Guest realmayo Posted April 1, 2021 at 09:06 AM Report Posted April 1, 2021 at 09:06 AM 3 minutes ago, roddy said: Ha, that'll teach me to skim read and not think critically... Don't worry, some greybeard will rock up here soon and say that 800 字s per 分钟 was pretty standard for levels 9-11 on the old old HSK. Quote
roddy Posted April 1, 2021 at 09:44 AM Report Posted April 1, 2021 at 09:44 AM We ARE the greybeards. 4 Quote
New Members Tuan Tran Posted April 1, 2021 at 10:48 AM New Members Report Posted April 1, 2021 at 10:48 AM HKS 3000 characters.docx In case someone needs it: attached is the list of 3000 characters I extracted from the pdf file using Google docs OCR. I skimmed over the docx file and still saw OCR errors in some places (in yellow). Does anyone have better output? Thanks. 1 Quote
mikelove Posted April 1, 2021 at 01:24 PM Report Posted April 1, 2021 at 01:24 PM I've added OCR'ed versions of the character lists to that Github repo too. (though they'll be harder to automatically proofread since we can't simply check them against lists of valid Chinese words as with the word lists) 1 Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.