Jump to content
Chinese-Forums
  • Sign Up

Word statistics for Chinese Books


Recommended Posts

Posted

Admin Note:  Split off from discussion here

 

On 7/14/2019 at 8:18 PM, Publius said:

Btw, here's some statistics for the other two Chinese novels:

《呼兰河传》 84,715 total characters; 2,283 unique characters; 5,565 unique words

《骆驼祥子》 116,560 total characters; 2,561 unique characters; 7,193 unique words

 

How might I go about obtaining a list of books with this type of information? It's very interesting. 

 

Now I am nearly finished with 骆驼祥子, and I'm trying to decide if I will read something challenging or something a little easier for the next book. I was thinking it might be and enjoyable reward to choose a book that is a little easier and have that warm feeling of being able to read through a few pages and not needing to use the dictionary so frequently. Just enjoy reading. 

 

I have translations of The DaVinci code and also the first volume of the Olympians from Percy Jackson. They both look compared to be simpler then 骆驼祥子. I also have 狼图腾 which appears to be a little more challenging.

Posted
15 hours ago, david387 said:

How might I go about obtaining a list of books with this type of information? It's very interesting. 

I made a program that produces such information.

Posted
2 hours ago, imron said:

I made a program that produces such information.

Yes, I have seen it before. Nice work. Is there anywhere online where you or someone else has created a list of books and their statistics?

Posted
8 hours ago, david387 said:

Is there anywhere online where you or someone else has created a list of books and their statistics?

Not that I'm aware of.

Posted
4 minutes ago, imron said:

Not that I'm aware of.

 

Seems like this would make for a popular resource on the internet. With only even a hundred books listed, it would be possible to start to categorize books by difficulty level. Of course you could also do some small algorithms like the number of pages divided by the number of  vocabulary words 

Posted
2 hours ago, david387 said:

Seems like this would make for a popular resource on the internet

I've thought of doing something similar to show how learning a small amount of the most frequent words in a text builds understanding.

 

A couple of years back when ChineseBookClub on reddit was more active I did some analysis on the previous several months worth of books, showing how much you'd understand if you'd just been learning the 10 most frequent unknown words every day.  The results are far better than if you'd been learning a similar amount of vocab from general word lists e.g. the HSK (see here for details).

  • Helpful 1
Posted
2 hours ago, david387 said:

With only even a hundred books listed, it would be possible to start to categorize books by difficulty level.

 

However number of unknown words does not necessarily equate to difficulty level. For me, sentence complexity structure, grammar, abstractness of writing is much more of an important factor than number of unknown words,  Mind you it's a good starting point. I suppose. 

I think reading with a tablet and Pleco its far easier to tackle book with a high number of unknown words as one can read with minimal breaks . However one needs to resist the temptation if checking the dictionary before you given yourself a few seconds to recall it. I've scowled myself a few times for immediately checking a word only to realise it's in HSK3!

 

I know many like to voice the merits of reading at the 90% ,  95% level (or whatever the optimal point  is) but  it's too theoretical to be of much use I'm my view . You could well have a vocab of 5000 words and yet face many unknown words from a 2500 word graded reader. 

 

Personally I think the best way it's to find a book that you are interested it , have at crack at the first ten or twenty pages and then you will have a good feel as to whether it will be for you or not. 

On my forth book now , after 3 attempted failures (either too difficult or simply not interested).

 

My ultimate driver  in persistence of  reading is "interest". Interest can trump difficulty level (however we wish to define it) by a large margin imo. Many books are recommended on line for Chinese learners but it may well have no interest for you personally, hence its an important consideration.

 

When selecting a book I ask myself now "would I read this if it were in English" . 

Posted
1 hour ago, DavyJonesLocker said:

Mind you it's a good starting point. I suppose.

There has been some recent discussion on what makes a good metric of 'difficulty' in the Chinese Text Analyser thread.

Posted

In answer to your other question:

On 8/18/2019 at 7:44 AM, david387 said:

Now I am nearly finished with 骆驼祥子, and I'm trying to decide if I will read something challenging or something a little easier for the next book. I was thinking it might be and enjoyable reward to choose a book that is a little easier and have that warm feeling of being able to read through a few pages and not needing to use the dictionary so frequently. Just enjoy reading. 

 

I have translations of The DaVinci code and also the first volume of the Olympians from Percy Jackson. They both look compared to be simpler then 骆驼祥子. I also have 狼图腾 which appears to be a little more challenging.

 

You mentioned in your other book that this your 5th novel?  At this stage, I'd still opt for 'easier to read'.  It took me maybe 8-9 books before really being comfortable reading books in Chinese.  Keeping things easy allows you to consolidate your other non-vocab reading skills, and as you mentioned give you that positive boost that comes from feeling that you are reading Chinese!

 

I'd also opt for original Chinese works over translations.  With translated works you'll have a lot more words and transliterations that aren't common in Chinese and that are potentially difficult to look up translations for.

 

Finally, I'd recommend against《狼图腾》at this point in time (you can always come back to it several books down the line).  The book starts out well, but loses the plot about halfway through it and you might be hard-pressed to maintain interest (I was).  That just adds to the difficulty when you are still trying to get in to the groove of reading Chinese novels.  What are the other books you've read?

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...