smalldog Posted May 12, 2005 at 01:26 AM Report Posted May 12, 2005 at 01:26 AM How does it work? The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know. The simple answer... well, I don't have a simple explanation. Do take the results with a pinch of salt! I need to test the validity of my model by getting some volunteers to wade through a large number of characters, but I haven't written any code to record that data yet. Quote
gato Posted May 12, 2005 at 01:52 AM Report Posted May 12, 2005 at 01:52 AM The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know. Wow, fancy statistics! What is A? Is there a name for this function, so I can look it up? Quote
in_lab Posted May 12, 2005 at 03:28 AM Report Posted May 12, 2005 at 03:28 AM Nice program! I don't know how accurate it was, but I was happy with the range that was coming up. But please, we want to learn the fancy statistics. Quote
smalldog Posted May 12, 2005 at 05:15 AM Report Posted May 12, 2005 at 05:15 AM Wow, didn't expect so much interest in this program. The function is a logistic function. http://en.wikipedia.org/wiki/Logistic_function . I'm just using it because it looks like what I expect, not because there's any model of language learning that suggests this is correct. In this graph the x axis represents Chinese characters in order of frequency, so x=1 is 的 and x=8000 is a character you will probably never come across. The y axis represents the probability of you knowing each character. The blue blobs show characters that you have been tested on... then we are certain that either you know them (y=1) or you don't (y=0). The pink line is the equation above fitted to the blue blobs. W is the point at which the line crosses y=0.5, and is also a good approximation of the total area under the curve, which is the total number of characters you know. The parameter A is just a measure of how steep the curve is. Quote
roddy Posted May 12, 2005 at 05:20 AM Report Posted May 12, 2005 at 05:20 AM Smalldog, why don't you make a new topic introducing your program, either in Textbooks and Resources (if you think it's finished and useful) or Computing (if you want some help with designing / programming it). I think a lot of people would be interested. Roddy Quote
gato Posted May 12, 2005 at 05:35 AM Report Posted May 12, 2005 at 05:35 AM Cool, is this what's called logit regression? Maybe we should call you bigdog from now on. Quote
smalldog Posted May 12, 2005 at 06:21 AM Report Posted May 12, 2005 at 06:21 AM Ok Roddy, I've started a new thread here in the computing forum. I want to make some improvements and check my model before 'releasing' it. Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. Quote
gato Posted May 12, 2005 at 06:47 AM Report Posted May 12, 2005 at 06:47 AM Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 骨头就骨头吧。 How's the teaching going? I check out your cugb forum and took that English test you linked to. Your test's shorter and therefore better. I predict it'll be a hit. Quote
woodcutter Posted May 18, 2005 at 03:34 AM Report Posted May 18, 2005 at 03:34 AM I second what Roddy said about it, except that I don't see how anyone could bear to take that test for more than 5 characters. Do I know that character? Yeah....think so! I want to check! Quote
錢 勇 龍 Posted August 6, 2006 at 02:11 PM Report Posted August 6, 2006 at 02:11 PM wow Gato, you English is excellent! I hope one day I will be as good in Traditional Chinese as you are in English! Quote
Sgt_Strider Posted March 24, 2008 at 12:14 AM Report Posted March 24, 2008 at 12:14 AM Gato, Most of the links in your post are dead. Do you think you can update them for us? I know this thread is old, but it contains useful information. Quote
amego Posted March 24, 2008 at 05:03 PM Report Posted March 24, 2008 at 05:03 PM In the literal sense the Zhonghua Zihai, records a staggering 85,568 single characters, although even this fails to list all characters known http://en.wikipedia.org/wiki/Chinese_character Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.