Xi'Er Dun Posted September 23, 2007 at 01:15 AM Report Share Posted September 23, 2007 at 01:15 AM I have a question about the Unihan Database by Unicode, it regards the amount of Chinese-Japanese-Korean-(Vietnamese) [CJK (V)] Characters (ie. Hanzi, Kanji, Hanja, Chữ Nôm/ Hán Tự/ Chữ Hán) included in the encoding Unihan Database Unicode character set(s). How many CJK Characters (ie. Hanzi-Traditional 繁體漢字 and Simplified 简体汉字, Kanji-Kyuujitai/Shinjitai 舊字體/新字体 漢字, Hanja 漢字, Chữ Nôm 字喃/ Hán Tự漢字/ Chữ Hán字漢) are included in the general Unihan Database, and how many are there in the Unihan extended database(s) too? Does the Unihan Database include all Characters that are entries in Morohashi Dai-Kan-Wa-Jiten 大漢和辭典, Cihai Zidian 辭海字典, Kangxi Zidian 康熙字典, Hanyu-Da-Zidian 漢語大字典, etc. Is there a Unicode Unihan expert on this forum, if so could they please answer my questions? 謝謝您 如何も有り難う御座い升 希爾頓從 Quote Link to comment Share on other sites More sharing options...
imron Posted September 23, 2007 at 03:46 AM Report Share Posted September 23, 2007 at 03:46 AM I'm not sure what you mean by the extended database. Unihan is just one big collection of chinese characters. Perhaps you are confusing this with the Basic Multilingual Plane (BMP) and the Supplementary Multlingual plane (SMP)? Altogether though, the Unihan database contains information for 71226 unique code points (and obviously, some code points have more information than others). Regarding the dictionaries you mention, I'm not sure if Unihan contains all the entries from those dictionaries, however it does list information about the dictionaries it uses. If you visit this page, it lists dictionary indice information for all the dictionaries used to compile/cross-check the database. Following the link to a given dictionary will tell you how many dictionary indices exist in Unihan database for that dictionary. Therefore there will be at least that many characters in the database from that dictionary and possibly more (I say at least, because the information is listed as provisional and so might not be complete). E.g There are dictionary indices for 55812 characters from the 汉语大词典 Hanyu Da Cidian, 70205 from the Kangxi etc. Suffice to say, regardless of whether it contains all the characters in those dictionaries, Unihan is almost certainly the most comprehensive database of CJK characters. Quote Link to comment Share on other sites More sharing options...
trevelyan Posted September 23, 2007 at 11:48 AM Report Share Posted September 23, 2007 at 11:48 AM I'm under the impression it doesn't contain all of the entries in the 康熙字典, although the missing entries are mostly variants. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.