Johnny-5 Posted January 6, 2023 at 03:24 PM Author Report Posted January 6, 2023 at 03:24 PM On 1/6/2023 at 10:54 PM, PerpetualChange said: I'd love to know more about how you find the time to do this. Not being sarcastic here... I'm 30% through a book that I've been reading since October 2022. I can't imagine I'll finish it before March or April. Even in my native language, I spend 1-2 months reading a book. Depending on the style, I take about 5-10 minutes to read a page. Granted... I only spend 1-2 hours per day reading, if that. Many days, it is much less. So... what's it look like? As a guy with a wife, kid, fulltime job, other commitments/hobbies. Spell it out for me. What's a day in the life look like? Well, first off I'm tackling Agatha Christie books which are not the longest or most challenging books around. So oftentimes my Kindle tells me that the current book will take 4-5 hours to read. Second I trained myself to listen/read faster. At first I was at 0.75x and found it hard to keep up with the text and the meaning, so I bumped it down to 0.5x and read a few books that way, but gradually I tried speeding it up (and also not trying to read anything too far outside my comfort zone) and now I cruise at 1.75x and at that pace a 200 page book takes about 4 hours. Third, I used to watch Youtube videos or browse the web for an hour or so before bed, but that's good reading time. Also breakfast and lunch time I can get some reading in, and other times I may not be reading but I can have my headphones in while I'm driving and make progress in the story. I have two kids and one of them has no trouble doing homework, but the other needs to be watched, so I can sit with her and read (say "I'm doing my homework") while she's doing hers. I don't cut into family time or other obligations, but just fit it in where I can... Hell, I'll also set a bookmark and let my Kindle read me to sleep. I find it stimulating enough to keep my mind off other topics, but not so stimulating to keep me awake. I usually have to go back and start over from the bookmark the next day. 1 1 1 Quote
Jan Finster Posted January 7, 2023 at 07:22 AM Report Posted January 7, 2023 at 07:22 AM On 1/6/2023 at 4:24 PM, Johnny-5 said: now I cruise at 1.75x Even this is a feat in its own. I find listening to audiobooks quite challenging because I tend to drift in and out of focus. However, doing so for 4 hours at 1.75x speed in Chinese and staying focussed is pretty incredible.... 1 Quote
imron Posted January 8, 2023 at 05:12 AM Report Posted January 8, 2023 at 05:12 AM On 1/7/2023 at 2:03 AM, Johnny-5 said: I think your program might turn out to be a helpful tool in sorting out books that are in the right zone This was one of the main use cases it was designed for. Quote
Johnny-5 Posted January 8, 2023 at 10:17 AM Author Report Posted January 8, 2023 at 10:17 AM On 1/8/2023 at 1:12 PM, imron said: This was one of the main use cases it was designed for. I ran that known words LUA script and it's pretty awesome, but the fact that Calibre stores the ebooks with pinyin filenames makes it a challenge to figure out which books are which in a long list. Have you got anything to help out on getting the data back into Calibre? I was thinking I might try to get CTA to write the percentages of known words directly into the metadata.opf Calibre stores with each book, but that's not a very clean method, have you got a more elegant solution? Quote
imron Posted January 8, 2023 at 08:30 PM Report Posted January 8, 2023 at 08:30 PM On 1/8/2023 at 9:17 PM, Johnny-5 said: I was thinking I might try to get CTA to write the percentages of known words directly into the metadata.opf Calibre stores with each book, but that's not a very clean method What is unclean about it? Seems like metadata would be a good use for this (but I know nothing about the caliber format) Another alternative would be to write the stats to a separate file named the same but with a separate extension, so stats for huozhe.txt would be written to huozhe.stats or similar. Quote
Lu Posted January 9, 2023 at 11:40 AM Report Posted January 9, 2023 at 11:40 AM On 1/5/2023 at 9:34 AM, Johnny-5 said: I like reading mysteries and cannot find any written in Chinese so I haven't bothered looking. Mystery as a genre is not very big (yet) in China, but apparently there are a few authors, see article. Not mentioned in the article is Chan Ho-kei 陈浩基, a Hong Kong author of mysteries in the Sherlock Holmes tradition. 1 Quote
Johnny-5 Posted January 9, 2023 at 02:35 PM Author Report Posted January 9, 2023 at 02:35 PM On 1/9/2023 at 4:30 AM, imron said: Seems like metadata would be a good use for this Yeah, Calibre stores the metadata in an XML file called metadata.opf file in the same directory that each book is stored in. Looks to be a pretty simple operation of opening up the file, finding the correct slot and writing the value, but I don't know where to start. I've never tried messing with LUA before and I don't really know how to go about opening and writing to an XML file, or any kind of file for that matter. In the documentation you mention this https://keplerproject.github.io/luafilesystem/ but that's a dead link... So I'm kinda stuck for what to do Quote
imron Posted January 10, 2023 at 04:33 AM Report Posted January 10, 2023 at 04:33 AM luafilesystem appears to have been moved here: https://github.com/lunarmodules/luafilesystem I haven't investigated it, but this library seems to provide xml support in Lua: https://github.com/clear-code/xmlua Quote
Johnny-5 Posted January 10, 2023 at 09:09 AM Author Report Posted January 10, 2023 at 09:09 AM On 1/10/2023 at 12:33 PM, imron said: luafilesystem appears to have been moved here: https://github.com/lunarmodules/luafilesystem I haven't investigated it, but this library seems to provide xml support in Lua: https://github.com/clear-code/xmlua Well, I figured out how to edit the metadata files, but unsurprisingly it turns out they are merely backup files and the main info is stored in a database. Looks like there's no good way to re-introduce the info from CTA back into Calibre without a lot more work (using the API or making a plugin) But I can extract the Chinese name from the metadata file and use that to make the results list more readable... so a win in my book Quote
Johnny-5 Posted January 10, 2023 at 02:40 PM Author Report Posted January 10, 2023 at 02:40 PM On 1/7/2023 at 3:22 PM, Jan Finster said: Even this is a feat in its own. I find listening to audiobooks quite challenging because I tend to drift in and out of focus. However, doing so for 4 hours at 1.75x speed in Chinese and staying focussed is pretty incredible.... It can be a battle to stay focused, but there are ways to give yourself an advantage. I would say I stay focused for 4 hour stretches. I usually go for half hour stretches, maybe as long as an hour and a half. It really depends on how compelling and understandeable the content is. I was read "Notre Dame de Paris" and it was well outside my comfort zone and also I didn't find the story interesting and I couldn't focus well at all, even though it was challenging I found Les Mis to be much more compelling. Then there was "the unexpected guest" (意外来客) from Agatha Christie, it's not hard to read and I found every moment of the story compelling, so I probably did read that one nearly straight through. Listening with the text helps keep me anchored to what's going on and not getting distracted. Seeing the underline of the current sentence also helps a lot if my attention wanders for a second. If my attention wanders too much then I'll abandon a book because it's either too hard or too boring or both. I'll ditch a book even if I'm several hours in. Or maybe I'll let it drone in the background only half paid attention to or understood. I try not to do that too much, but sometimes I just want to get a book over and done with. oftentimes I need to take walks because then at least I'm "doing something" and not just trying to sit in one place and focus on a book. 2 Quote
Dr Mack Rettosy Posted January 14, 2023 at 06:31 PM Report Posted January 14, 2023 at 06:31 PM If you enjoy the mystery/crime genre and are looking for native material you could try 推理之王 trilogy by 紫金陈. The second book (The Bad Kids) was my favorite and stands on its own pretty well. This was also made into a well received TV show. Here is the character count / unique character count using Chinese Text Analyzer: 2 2 Quote
Johnny-5 Posted January 15, 2023 at 02:37 AM Author Report Posted January 15, 2023 at 02:37 AM On 1/15/2023 at 2:31 AM, Dr Mack Rettosy said: If you enjoy the mystery/crime genre and are looking for native material you could try 推理之王 trilogy by 紫金陈. Just looking in my library I think that I did read 无证之罪... can't say that I know what it was about, but I don't recall disliking it so I think I might check out the other books. Quote
Johnny-5 Posted January 16, 2023 at 11:07 AM Author Report Posted January 16, 2023 at 11:07 AM On 1/8/2023 at 1:12 PM, imron said: This was one of the main use cases it was designed for. It works well. "Crime and Punishment" (罪与罚) percolated up the list of books, CTA says I know 95% of the words and I was skeptical because it's a "classic of Russian literature" which I assume means "hard". (I also assumed it meant "boring", but that doesn't seem to be the case either.) In actual fact I'm finding it quite enjoyable and easy to read. The fact that it is sort of a murder mystery probably has something to do with why my known word list from reading lots of mysteries would have significant overlap. Looking at some scifi books in my collection and they're under 90% known, which I can tell means they won't be fun or easy to read. Interesting to see actual hard numbers on "narrow reading" Another interesting thing, I used another program to merge all of the books I've read (as far as I remember) into a single text file and some stats... I've read 7,281,888 words in Chinese I've read 10,676,852 characters The unique words in all that amount to 46,716 of which I know about 10,000 ie. 95%of the most common words or if you go by the "meeting it 17 times or more" metric then I should know 16,000 words. statistic nerding out with CTA is easy 1 Quote
phills Posted January 30, 2023 at 05:31 PM Report Posted January 30, 2023 at 05:31 PM Congrats! 10m chars in a year is quite an achievement as is being able to read at 1.75x listening speed, which is approx 435 characters per minute. I don't know of a good online vocab test, but here's a good online character count test that estimates your # known chars in about a minute. http://hanzitest.ericjiang.com/ I scored 4300 just now, and I have ~4600 chars in my flash cards, so it's pretty accurate for me, considering I'm sure I forgot a few chars from my pile. 2 Quote
Johnny-5 Posted February 3, 2023 at 04:31 PM Author Report Posted February 3, 2023 at 04:31 PM On 1/31/2023 at 1:31 AM, phills said: Congrats! 10m chars in a year is quite an achievement as is being able to read at 1.75x listening speed, which is approx 435 characters per minute. Thanks, it's quite encouraging to reflect on your progress. How do you figure on the 435 ch/m? I was thinking I could work it out based on how long a book takes and the character count, but I hadn't actually done the math. On 1/31/2023 at 1:31 AM, phills said: I don't know of a good online vocab test, but here's a good online character count test that estimates your # known chars in about a minute. http://hanzitest.ericjiang.com/ Saw something on that page that I think reflects on my state of Chinese literacy; in the FAQ they say "For many native speakers, recognizing characters is the primary obstacle in their literacy. For many non-native speakers who've crammed a bunch of hanzi flashcards, recognizing characters often does not mean understanding the word or sentence. For example, you may recognize "下" and "摆" but not understand that "下摆" means "hem"." I didn't remember the character "摆" or the word "hem" (guess that doesn't come up much in murder mysteries) but looking up "摆" in Pleco I see that I know a large number of the words that contain that character. I'm not sure if they're saying I should cram some flash cards, but they might be saying that 2 Quote
phills Posted February 4, 2023 at 08:32 AM Report Posted February 4, 2023 at 08:32 AM On 2/4/2023 at 12:31 AM, Johnny-5 said: How do you figure on the 435 ch/m? I was thinking I could work it out based on how long a book takes and the character count, but I hadn't actually done the math. Average speaking speed is said to be around 250cpm. So 1.75 * 250 = 435. Empirically, I've listened to a lot of audiobooks, and 250cpm is pretty close to what I've measured as the speed, the few times I've done it. E.g. sapiens (non-fiction) on youtube is at 285cpm. I've listened to a wuxia novel 流星蝴蝶剑 at 200cpm. One of the Foundation novels (Asimov sci-fi) I've listened to on ximalaya is 245cpm. 2 Quote
Moshen Posted February 4, 2023 at 01:22 PM Report Posted February 4, 2023 at 01:22 PM Quote Average speaking speed is said to be around 250cpm. So 1.75 * 250 = 435. Average speaking speed in English is around 180 words/minute. Does that mesh? 180 English words equivalent to about 250 Chinese characters? Quote
phills Posted February 4, 2023 at 10:37 PM Report Posted February 4, 2023 at 10:37 PM On 2/4/2023 at 9:22 PM, Moshen said: Average speaking speed in English is around 180 words/minute. Does that mesh? 180 English words equivalent to about 250 Chinese characters? Pretty close. I've seen a study that sets the average Chinese word as ~1.5 characters. So 180 * 1.5 = 270 cpm. If English speaking speed is 170wpm, it'd be 170 * 1.5 = 255 cpm. Quick googling, https://virtualspeech.com/blog/average-speaking-rate-words-per-minute says from sampling TED talks: The average speaking rate was 173 words per minute. The speaking rate ranged from 154 to 201 words per minute. Popular TED Talk speaking rates How great leaders inspire action (Simon Sinek) – 170 wpm The power of introverts (Susan Cain) – 176 wpm Do schools kill creativity? (Sir Ken Robinson) - 165 wpm Why we do what we do (Tony Robbins) – 201 wpm The power of vulnerability (Brené Brown) – 154 wpm So it's very close for everyone other than Tony Robbins! Edit: That source also gives: Average speech rates Presentations: between 100-150 wpm for a comfortable pace Conversational: between 120-150 wpm Audiobooks: between 150-160 wpm, which is the upper range that people comfortably hear and vocalize words Radio hosts and podcasters: between 150-160 wpm Auctioneers: can speak at about 250 wpm Commentators: between 250-400 wpm Kennedy's Inaugural Ask-not-what-your-country-can-do-for-you speech was apparently at 100wpm. 2 Quote
Moshen Posted February 4, 2023 at 11:25 PM Report Posted February 4, 2023 at 11:25 PM @phills, Interesting. I had a one-minute TV spot for a couple of years and always planned for 180 words. I don't speak particularly fast, but I was right on target most of the time with that. Quote
Johnny-5 Posted February 6, 2023 at 04:44 AM Author Report Posted February 6, 2023 at 04:44 AM On 2/4/2023 at 4:32 PM, phills said: Average speaking speed is said to be around 250cpm. So 1.75 * 250 = 435. Empirically, I've listened to a lot of audiobooks, and 250cpm is pretty close to what I've measured as the speed, the few times I've done it. E.g. sapiens (non-fiction) on youtube is at 285cpm. I've listened to a wuxia novel 流星蝴蝶剑 at 200cpm. One of the Foundation novels (Asimov sci-fi) I've listened to on ximalaya is 245cpm. Wow, thanks for the explanation. Working backwards from the 10m characters I get about 383 hours of reading, which pretty much corresponds with what I thought I could acheive. This whole thing was motivated by the realization that time was the crucial variable I was missing in my Chinese learning. As I recall from Steven Kaufmann he studied Chinese for six months 8 hours a day as a full time job, and wouldn't you know but 25*40=1,000 hours. I also saw an interview with that youtuber who goes around NYC impressing people with his "perfect Chinese", and what did he say? He said he spent 2-3 years learning Chinese for an hour or more each day. So while I had been learning Chinese for many years, it was sporadic at best and certainly didn't add up to 1,000 hours. I can't really estimate, but maybe 200-400 hours, which is well short of the time you need to be profficient (If you take the FSI at their word that should be 1,200 hours for Chinese). At the beginning I calculated that I could probably put in about 400 hours if I used most of my free time for reading, and as I see from the 383 hours calculation, that's not far off (it was probably more hours because I was going slower in the beginning) Now, for more numbers.... I recall reading somewhere that some method was averaging about 12-15 words learned per hour, so if we use the high end of that then in 383 hours I should have learned 5,745 words. That corresponds with what CTA says about me having learned 10,000-15,000 words. Because I probably starte with 5,000 and added somewhere in the neighborhod of 6,000 words. 1 Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.