Jump to content
Chinese-Forums
  • Sign Up

At what known-word % should I start reading a book?


Recommended Posts

Posted
3 hours ago, ZhangKaiRong said:

What does the Chinese Text Analyser analyse?

It aims to analyze at the word level because that's the basic unit learners should be using.  It also tracks very basic character statistics such as total character count, but it doesn't track known/unknown characters, or provide frequency information at the character level, only the word level.

 

It has a dictionary of words (including polysyllabic ones and Chengyus) and matches against that.

 

I say 'aims' to work with words because the segmenter is fairly basic and so doesn't always get the word boundaries correct, but 'words' is the basic unit it works with.

 

5 hours ago, Dawei3 said:

A fascinating perspective.....

It's one that most people overlook (including myself at the time) until you start getting in to reading.

Posted
6 hours ago, Moshen said:

post the ISBN number

Moshen, I still have the book. When I get home, I’m glad to check its ISBN. My friend bought it for me in China. It seemed like a legitimate copy because of its quality. 

Posted
7 hours ago, Moshen said:

ISBN number?

The ISBN is 978-7-5613-4411-8

 

In the back of the book, it gives jiaoliu@booky.com.cn  and also suggests their website. However, I got the book ~10 years ago. 

 

 

  • Like 1
  • Helpful 1
Posted
8 hours ago, imron said:

I say 'aims' to work with words because the segmenter is fairly basic and so doesn't always get the word boundaries correct, but 'words' is the basic unit it works with.

Did you write your own parser or are you using something like jieba?

Posted

I wrote my own parser. CTA predates jieba by a couple of years.  The parser is faster but less accurate.  I've been working on a more accurate (but still fast) parser, but life has been busy the last couple of years and haven't been able to put as much time in to it as I'd like.

Posted

Sorry, I don't know how to reply with quotes and refer to several forume contributors simultaneously.

 

First, I believe speaking is the most valuable skill in learning foreign languages. It's not only for communication. If one can speak a language he is confident he knows it, and even reading becomes easier. Nevertheless, the bulk of new words comes from reading predominantly. But the problem is that when one just reads texts in a foreign language looking up for new words he trains mainly his ability to recognize the new words in the future. In speaking or writing these words are not easily recollected by him to be “on the tip of his tongue”. I'm a native Russian speaker, I can read in English almost everything without a dictionary, but whet it comes to speaking or writing it becomes much more difficult. It's a well known thing, of course. Therefore it's necessary to learn words not "from the language" only but "to the language", in order to place them in one's active memory, not only passive one. However, we need to know how to use the words, so we can't manage without reading.

 

Second, in the case of Chinese one's active and passive vocabularies are very close to each other since we are confident we know a Chinese word only if we can write it by hand. Recognition is not enough. So even to read we need to learn how  words are written, i.e. we learn them "to the language", not "from the language".

 

These two reasons prompted me to use a technique the idea of whicn I developed on the basis of the Russian-Chinese pidgin language, 中俄混合语, that existed in some Siberian and Northern China areas in late XIXth - first half of XXth century. It was based on Chinese grammar, but mainly Russian vocabulary. Russian is very flexible, almost any words order is possible without changing the meaning, it's easy to construct new words that will be understandable, etc.

 

So, what I do. I open a text of a reasonable length (a Chapter, say) with CTA program, where a list of words that I know (with ability to write them by hand) is loaded already, see, that known words make up 70%, or even 50% (or even less, it doesn't matter) - unique known words share is twice as lower, of course - and I mark additional 50 - 60 words as known. They can be the next words by frequency or the words that appear more than 3 times or words picked up by another criteria. Then I write in Russian what I call "gateway superscript" of the Chinese text (not a translation) - word by word I copy the Chinese replacing Chinese words for Russian ones, so the sentence structure and punctuation remain Chinese. With Russian it's possible, and sentence remains easily understandable. If in the Mandarin text there is one word for the notion that can be expressed in Russian only in several words, these can be written with the symbol “+”. If Russian requires an additional word (e.g. preposition), it is taken in brackets. In places where on the contrary there is an additional word in the Chinese text, in Russian empty brackets are put down. In braces I give other variants of superscript, and in square comments. Also I mark with color or underline the words, which I know and which I'm going to make known (that are marked in the CTA window) . After that, reading the superscript, I just replace the marked words for characters, whereas the segments that are not marked (uknown words, which I'm not going to learn know) I just rewrite in Rusian. I enclose a picture how it looks in my copybook. I repeat the exercise several times with intervals (hours or days). When all the words are known I add additional words in the same text or go to another text (the next chapter or quite another text). By this way  words are learnt in their usage, in an integral text (that is better than seperate examples), and by "layers" according to frequency. I can stop at any proportion of known words and switch to another text.

 

I understand my post is not helpful for those who don't know Russian. However, taking into account this way of learning foreign languages is applicable for learning any language, not onle Chinese, they may consider learning Russian for this purpose. 

IMG_0114.jpg

  • Like 1
Posted

The post was deleted, as I found a more effective way to learn oral Mandarin and characters. I do not want to guide others along the wrong path.

Posted

@Pall, I took the liberty of adding paragraphs to your text, hope that it OK.

  • Like 1
  • Helpful 2
Posted

The post was deleted, as I found a more effective way to learn oral Mandarin and characters. I do not want to guide others along the wrong path.

 

 

Posted

The post was deleted, as I found a more effective way to learn oral Mandarin and characters. I do not want to guide others along the wrong path.

Posted

The post was deleted, as I found a more effective way to learn oral Mandarin and characters. I do not want to guide others along the wrong path.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...