Weyland Posted April 1, 2021 at 08:06 PM Report Posted April 1, 2021 at 08:06 PM 15 hours ago, calculatrix said: On 3/31/2021 at 4:50 AM, roddy said: Oh, interesting they specify characters per minute for listening speeds. Up to a max of 800. And also for reading. Wow!!! Really??? Who can talk that fast? That is not conversation, that is olympic. 800 syllables per minute is 13 per second. "One missisippi two missisippi three missi" I cannot. Just reading the characters and not (entirely) vocalizing their pronunciation would get you there. However, 800 characters per minute sounds like a lot, but take this piece, 《白杨礼赞》, from the PSC exam. It is around 600 characters per minute and the reading speed is not exactly US cattle actioneer like. Once your language ability becomes better your start to skim over a ton of the words and skipping the most common/predictable ones, just like you do in your native language, that's how you can have two "the" follow eachother and your brain would only read one. That's also why after you've spend a lot of time reading, but very little time listening, your listening comprehension will still have improved as you have a better gauge for the language, i.e you've become better at predicting the words. I'm an idiot, and instead of taking the number for the characters I took the length of the text I parsed into Notepad++, and forgot to do critical thinking. Forget what I said. Quote
Weyland Posted April 1, 2021 at 08:06 PM Report Posted April 1, 2021 at 08:06 PM DELETED - Reply was duplicated Quote
TaxiAsh Posted April 1, 2021 at 08:51 PM Report Posted April 1, 2021 at 08:51 PM It's a bit frustrating, and terrible for all the learning books, and online channels. Just as a thought, I wondered earlier that Ielts is effectively a score of 9 maximum. Perhaps it's to put it in line with that? Quote
大块头 Posted April 1, 2021 at 09:53 PM Report Posted April 1, 2021 at 09:53 PM 1 hour ago, Weyland said: Just reading the characters and not (entirely) vocalizing their pronunciation would get you there. However, 800 characters per minute sounds like a lot, but take this piece, 《白杨礼赞》, from the PSC exam. It is around 600 characters per minute and the reading speed is not exactly US cattle actioneer like. Is there another audio file you mean to link to? The text is 561 characters and the audio is 3 minutes, making for an approximate speaking rate of 187 CPM. The typical speaking rate of Chinese people reading aloud is 255±29 CPM. The Guinness Book of World Records fastest speaker had a measured speaking rate of 586 words per minute. English words have an average of 1.2 syllables, so that would work out to about 700 syllables per minute. I don't see how 800 CPM is supposed to be comprehensible. 1 Quote
New Members Tuan Tran Posted April 1, 2021 at 10:14 PM New Members Report Posted April 1, 2021 at 10:14 PM I guess 800 CPM maybe the maximal speaking rate sampled from a testing video, not the average speaking rate of the whole video. Even though it's unrealistic and a bad metric. Quote
Weyland Posted April 1, 2021 at 10:57 PM Report Posted April 1, 2021 at 10:57 PM 1 hour ago, 大块头 said: Is there another audio file you mean to link to? No, my brain is mush and literally switched around the text length (bytes of .txt file) with the character count. I might have gotten a bit too little sleep for the last week+, averaging 2-3 hours a night. Apologies for the confusion. Quote
roddy Posted April 2, 2021 at 05:11 AM Report Posted April 2, 2021 at 05:11 AM 6 hours ago, Tuan Tran said: I guess 800 CPM maybe the maximal speaking rate sampled from a testing video, not the average speaking rate of the whole video. Even though it's unrealistic and a bad metric. No, I just misread length as speed, that's all, then Weyland miscalculated something. Nobody is actually using that speed for anything. Quote
Jan Finster Posted April 2, 2021 at 05:25 AM Report Posted April 2, 2021 at 05:25 AM 8 hours ago, TaxiAsh said: It's a bit frustrating, and terrible for all the learning books, and online channels. It is not necessarily bad that the Chinese language teaching community updates its approach every 10 years. It could lead to innovation and updated material. I very much hope the new material is not just "“old wine in new bottles” 1 Quote
erlenwein Posted April 3, 2021 at 03:17 PM Report Posted April 3, 2021 at 03:17 PM Sorry if it's a silly question - is there a de-watermarked printer-friendly version of this file yet? I've tried to play around with ImageMagick but to no avail, and I usually prefer to print materials out to reduce the strain on my eyes. Quote
shawky.nasr Posted April 3, 2021 at 03:22 PM Report Posted April 3, 2021 at 03:22 PM HSK 1 – 9 Vocabulary List / New HSK 3.0 (2021) Changes and Updates VocabularyListofNewHSKLevel1-9(2021) -editable pdf 1 Quote
erlenwein Posted April 3, 2021 at 03:26 PM Report Posted April 3, 2021 at 03:26 PM Thank you! Yet I wouldn't call it printer-friendly - 253 pages of text in single column? original list has pinyin in it, at least, and the actual word list takes up way less space in total. Thanks nonetheless! Quote
shawky.nasr Posted April 3, 2021 at 03:30 PM Report Posted April 3, 2021 at 03:30 PM 5 minutes ago, erlenwein said: 253 pages of text in single column You can re-edit the file. Pinyin, they will add it later. You can generate pinyin as well, using CCedict database. Quote
大块头 Posted April 3, 2021 at 03:42 PM Report Posted April 3, 2021 at 03:42 PM 18 minutes ago, shawky.nasr said: HSK 1 – 9 Vocabulary List / New HSK 3.0 (2021) Changes and Updates VocabularyListofNewHSKLevel1-9(2021) -editable pdf Isn't that just the wordlist text file Mike Love shared above saved in a PDF? Give me a few minutes... I'm working on generating a version of the PDF without the watermark and with OCR so it's searchable. 1 Quote
shawky.nasr Posted April 3, 2021 at 04:09 PM Report Posted April 3, 2021 at 04:09 PM There are some ways to remove watermarks without effect on dpi. Here is example. W020210329527301787356_166.pdf Quote
大块头 Posted April 3, 2021 at 04:22 PM Report Posted April 3, 2021 at 04:22 PM I've compiled a watermark-free PDF, but I'm running into issues with Tesseract adding extra spaces between all the characters. Any suggestions for running OCR on this thing? Quote
shawky.nasr Posted April 3, 2021 at 04:37 PM Report Posted April 3, 2021 at 04:37 PM 12 minutes ago, 大块头 said: Tesseract adding extra spaces Normal, better to use ABBYY在线识别服务: 免费文字识别服务 | 泰比(ABBYY)官方网站 Maybe because file exported it by Adobe indesign Quote
大块头 Posted April 3, 2021 at 04:42 PM Report Posted April 3, 2021 at 04:42 PM 4 minutes ago, shawky.nasr said: Normal, better to use ABBYY在线识别服务: 免费文字识别服务 | 泰比(ABBYY)官方网站 The pdf is about 90 MB, too large for that service. Quote
shawky.nasr Posted April 3, 2021 at 04:47 PM Report Posted April 3, 2021 at 04:47 PM Just now, 大块头 said: 90 MB How? original file is 48.7 mb Which kind of method used to remove Watermark? The size should be smaller than 48 Quote
mikelove Posted April 3, 2021 at 05:01 PM Report Posted April 3, 2021 at 05:01 PM 1 hour ago, 大块头 said: Isn't that just the wordlist text file Mike Love shared above saved in a PDF? Yep, just diffed it and aside from converting punctuation to half-width versions it's identical. Really classy. Honestly the main reason I slapped an MIT license on it was the 'no warranty' part - i.e. don't sue me if you fail your HSK because of an OCR error - along with reassuring other developers that they were in fact allowed to use it (rather than being some scary random internet file of uncertain provenance), but acknowledgement would be nice. EDIT: also, 'no warranty' is a big part of why BSD/MIT licenses require you to include a copy of the license notice when you redistribute the data; end users need to also be informed there's no warranty, despite however many intermediaries the data went through to get to them. Quote
shawky.nasr Posted April 3, 2021 at 05:53 PM Report Posted April 3, 2021 at 05:53 PM 1 hour ago, 大块头 said: The pdf is about 90 MB, too large for that service. If you can't remove pdf idesign watermark with using Abbyy OCR, would be many mistakes. You can review other people files word by word, as excel or word. (Hard work) It is better to wait official website will share data as excel file. 材料下载 Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.