Jump to content
Chinese-Forums
  • Sign Up

Total Number of Chinese characters


Recommended Posts

Posted
How does it work?

The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

The simple answer... well, I don't have a simple explanation. :)

Do take the results with a pinch of salt! I need to test the validity of my model by getting some volunteers to wade through a large number of characters, but I haven't written any code to record that data yet.

Posted
The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

Wow, fancy statistics!

What is A? Is there a name for this function, so I can look it up?

Posted

Nice program! I don't know how accurate it was, but I was happy with the range that was coming up. :D But please, we want to learn the fancy statistics.

Posted

Wow, didn't expect so much interest in this program. :D

The function is a logistic function. http://en.wikipedia.org/wiki/Logistic_function . I'm just using it because it looks like what I expect, not because there's any model of language learning that suggests this is correct.

graph.gif

In this graph the x axis represents Chinese characters in order of frequency, so x=1 is 的 and x=8000 is a character you will probably never come across. The y axis represents the probability of you knowing each character. The blue blobs show characters that you have been tested on... then we are certain that either you know them (y=1) or you don't (y=0). The pink line is the equation above fitted to the blue blobs. W is the point at which the line crosses y=0.5, and is also a good approximation of the total area under the curve, which is the total number of characters you know. The parameter A is just a measure of how steep the curve is.

Posted

Smalldog, why don't you make a new topic introducing your program, either in Textbooks and Resources (if you think it's finished and useful) or Computing (if you want some help with designing / programming it). I think a lot of people would be interested.

Roddy

Posted

Ok Roddy, I've started a new thread here in the computing forum. I want to make some improvements and check my model before 'releasing' it.

Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

Posted
Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

骨头就骨头吧。 :D How's the teaching going? I check out your cugb forum and took that English test you linked to. :wink: Your test's shorter and therefore better. I predict it'll be a hit.

Posted

I second what Roddy said about it, except that I don't see how anyone could bear to take that test for more than 5 characters. Do I know that character? Yeah....think so! I want to check!

  • 1 year later...
Posted

wow Gato, you English is excellent! I hope one day I will be as good in Traditional Chinese as you are in English!

  • 1 year later...
Posted

Gato,

Most of the links in your post are dead. Do you think you can update them for us? I know this thread is old, but it contains useful information.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...