fenlan Posted August 5, 2005 at 04:29 PM Report Posted August 5, 2005 at 04:29 PM I found on http://fhpi.yingkou.net.cn/bbs/1951/61.htm and other pages on the same BBS a listing of all 6763 characters in the GB set, organised by frequency! They are in the attached spreadsheet. This list came with the following text: 汉字频度表——对 ChenShuyuan先生转载清华大学统计资料进行了加工 今对 ChenShuyuan先生转载清华大学统计资料进行加工,公布如下,仅供参考。 使用字数 6763 字(国标字符集),范文合计总字数 86405823个。 根据上表数据绘制图表,可以说明一点问题,感兴趣者可以试一试。 统计时是否遇到过国标字符集以外的字,是否包含有各类专业范文,等等,不得而知。 构词能力较强的字,其频度就会较高;否则,频度较低。 过去曾经几次发布过常用字表,起到了积极作用。估计常用字、次常用字及少数非罕见字控制在 4000-5000 字左右为好。在此范围之外,生字明显增多。 建议各类文章作者、编者、编辑等工作者们在你们的作品中一旦使用了罕见字,请用汉语拼音方案给予注音,必要时予以解释。免得读者们费时去翻阅辞书。 是否还有词频统计结果,盼告。汉语词汇约有十二万个 Quote
gato Posted August 5, 2005 at 04:33 PM Report Posted August 5, 2005 at 04:33 PM Wow, a whole discussion forum devoted to pinyin. That puts our own little pinyin thread to shame. Quote
nipponman Posted August 5, 2005 at 04:44 PM Report Posted August 5, 2005 at 04:44 PM Indeed it does, I want to read all the threads to see some native opinions on the topic. Quote
fenlan Posted August 5, 2005 at 05:03 PM Author Report Posted August 5, 2005 at 05:03 PM Nipponman, did you say you can't get Excel on your computer? If you send me your email address in a PM, I'll send you a text file (461KB) with the 6763 characters by frequency. Let me know if you need it. Quote
nipponman Posted August 5, 2005 at 06:39 PM Report Posted August 5, 2005 at 06:39 PM you've got a pm:D . Quote
Song You Shen Posted August 5, 2005 at 08:54 PM Report Posted August 5, 2005 at 08:54 PM In the excell sheet, other than 汉字, it has 4 other headings... <<出现次数>> <<累计字数>> <<万分比>> <<累计万分比>> What does these mean? Thanks. Youshen Quote
fenlan Posted August 5, 2005 at 09:03 PM Author Report Posted August 5, 2005 at 09:03 PM <<出现次数>> <<累计字数>> <<万分比>> <<累计万分比>> 1. The number of times the individual character occurs in the 87 million character database. 2. The cumulative number of characters represented by that individual character and the ones previous to it in the list. 3. The proportion of the total number of characters that an individual character occurs, out of 10,000. 4. The cumulative proportion of the total number of characters that an individual character and ones previous to it in the list occur, out of 10,000. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.