Mark Yong Posted July 29, 2008 at 03:33 AM Report Posted July 29, 2008 at 03:33 AM I currently use Windows XP's Chinese (Taiwan) setting for inputting Chinese characters (either by pinyin or using the sketch pad for characters which I do not know the pinyin for), and have a couple of questions: 1. Does anyone know how how many Chinese characters (particularly Traditional Chinese) are coded in the standard Windows XP set of fonts? Last I read, it was in the region of 12,000. 2. Does anyone know if it is possible to download any Windows XP compatible packages that include more characters beyond the standard set of fonts covering the 12,000 characters (particularly for obscure dialect characters)? Or does Microsoft provide updates for this? Quote
imron Posted July 29, 2008 at 05:49 AM Report Posted July 29, 2008 at 05:49 AM The number of characters is not limited by XP, but rather is related to the individual fonts you have on your system. MS Arial Unicode has almost 39,000 characters, and should come with Windows. There are other fonts that contain more. See here for a list. Quote
Mark Yong Posted July 30, 2008 at 01:28 AM Author Report Posted July 30, 2008 at 01:28 AM Hi, imron, Thanks for the information. Question: How I normally test whether a character is available in my fonts list is like this: As and when I am unable to input a character via pinyin, I go to the IME pad in Chinese (Taiwan) setting, and hand-write it out as accurately as I can - if the character does not appear in the list, then I assume it is not part of the character set. Is this the correct/best way to exhaustively look up a character? E.g. I tested this using the character "4-dragons" (康熙字典 classifies it under the 龍 radical, and defines it as 龍行也). This character does not appear, which is surprising if the MS Arial Unicode has 39,000 characters (unless I was unlucky enough to choose the odd one that is not included!). In general, how do I view the full list of characters available in my character font set(s)? (Sorry, I am not very IT-savvy, so please bear with my rather rudimentary questions! ) Quote
liuzhou Posted July 30, 2008 at 04:04 AM Report Posted July 30, 2008 at 04:04 AM One way to see all the characters in a font is to open MS Word, select Insert - Symbol, then choose Font. All the characters are displayed. I'm sure someone will come along with something better, though. MS Arial Unicode has almost 39,000 characters Is that 39,000 Chinese characters, though? Quote
Mark Yong Posted July 30, 2008 at 05:45 AM Author Report Posted July 30, 2008 at 05:45 AM liuzhou wrote:One way to see all the characters in a font is to open MS Word, select Insert - Symbol, then choose Font. All the characters are displayed. Thanks for the tip. I just tried it, using several Chinese font settings on my PC, i.e. PMingLiu and TW-Kai. Of course, there is no way to practically count the number of Chinese characters in the list! Also, the characters do not appear to be listed in any sensible order (for TW-Kai, they appear to be listed by radical, but it is not 100%), which means finding and selecting a character from the list is not a practical way to go. So, unless I either (1) know the pinyin for the character (2) know the Cangjie/Wubi/etc. code for it (3) write it out accurately using my mouse/stylus, there is really no way I will be able to locate and select a particular (obscure) character that I want. Actually, there is one way: If I go to the IME pad, select 部 (radical) as the input, and then search through the whole list of characters categorised under that radical by residual stroke-count, I should be able to locate the character - assuming I get the Kangxi radical/residual-stroke correct. Question: Would the number of characters differ between different font sets? If the answer is 'yes', then my next question would be: Then which character set is being represented in the IME list in Windows XP? Let's say if I download a new font set with even more characters (e.g. say XP came with 39,000 characters, and I now download a new font set with 45,000 characters), would it then update the IME list on my PC with the additional characters not originally there? How does it work? Quote
ipsi() Posted July 30, 2008 at 05:51 AM Report Posted July 30, 2008 at 05:51 AM It could just indicate that the handwriting pad doesn't know to look for that character. Pleco recognized what I'm assuming is the character you mean (Three dragon characters, one on top of two others, as though in a pyramid?), but I'm not sure if any of the free fonts I have at home will. Quote
imron Posted July 30, 2008 at 06:06 AM Report Posted July 30, 2008 at 06:06 AM Probably the best place to go would be the Unihan database. First select the number of strokes in the radical. In the case of 龍 it's 16. So choose that one, and from the resulting page, select the dragon radical. On this page, be sure to click the use utf-8 checkbox before hitting submit. Checking this will make the next page use the fonts installed on your computer, otherwise, it will use images. This will then show a list of all characters that use that radical (or you can limit it by the number of strokes). If you see a question mark, it means that your fonts don't support that character. You can click on the individual characters to view a more detailed version. You can also click on the question marks to see an image of what character they actually represent. You can also copy and paste that character into another program such as word. Anyway, it seems Quote
ipsi() Posted July 30, 2008 at 06:11 AM Report Posted July 30, 2008 at 06:11 AM Ah, right, I just get the Unicode code (3 bytes) for it. Seems I don't have the font installed for it . Seems even Pleco doesn't know about that one. Quote
Mark Yong Posted July 30, 2008 at 06:17 AM Author Report Posted July 30, 2008 at 06:17 AM imron wrote:If you can see it too (instead of just a large ?), then it means that it is perhaps an issue with your IME, rather than with your fonts. http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2A6A5&useutf8=false. I clicked on the character to go to the relevant page. In the column "The Unicode Standard", I see the character (displayed as an image). However, in the "Your Browser" column, I get a large ? where the character should be. Guess that means it is an issue with my IME. Any way to increase the number of characters recognisable by IME? imron wrote:The IME is separate from the font system. If the IME doesn't support the character, then there is no way to type it, even if you have a font with that character available. In these situations, your best bet (even though it's significantly slower) will be to copy/paste it from the Unihan page I linked to above. That means I will be copying-and-pasting the character as an image, right? Quote
ipsi() Posted July 30, 2008 at 06:20 AM Report Posted July 30, 2008 at 06:20 AM No, the '?' means that you need a new font. And you'll still be copy/pasting, but you won't be copy pasting the image. There is a fairly large difference . There'll be fonts out there that will display it. I'm a little surprised the default Windows ones don't. (Not so surprised about the default Linux ones ). Quote
Mark Yong Posted July 30, 2008 at 06:29 AM Author Report Posted July 30, 2008 at 06:29 AM Hi, ipsi(), Okay, so if I understand you correctly, I now need to find, download and install a new font set that includes this character, if I am to view it correctly on the screen instead of seeing a large '?' in its place. Any recommended places where I can download a good set of fonts? Hi, imron, If I understand you correctly, downloading the necessary font will allow me to display the character, but this does not mean I will be able to input it, because it is not supported by my IME? If so, is there any way I can enable my IME to support new characters not already in its database? On a separate note: I have heard that it is possible to input a character just by keying in the Unihan code / Big-5 code / etc. How does that work? Where does one key-in the code to generate the character (if, let's say, I knew the code for the character)? Quote
ipsi() Posted July 30, 2008 at 06:46 AM Report Posted July 30, 2008 at 06:46 AM Mark, While I'm afraid I can't answer your questions, you at least understand correctly . It can be possible to input characters directly via their UTF-8(16, etc) code, but I'm not sure how to do so on Windows. I think it can be done in Word with something like [alt]+ , or something. Not entirely sure, sorry. Quote
imron Posted July 30, 2008 at 07:52 AM Report Posted July 30, 2008 at 07:52 AM As ipsi() mentioned, the large ? means you just don't have the font. Actually, you could copy and paste that large question mark into a post, or a word document, and anyone who did have the correct font would see the character and not the question mark. Also, when talking about copying and pasting, that was what I meant, to copy the text and not copy/paste the image. For most IMEs, there usually have a way to add characters to a dictionary. I just tested it now with the google IME. I can add the 4-dragon Quote
imron Posted July 30, 2008 at 03:15 PM Report Posted July 30, 2008 at 03:15 PM Checking now on my Mac at home, it would seem it doesn't have any fonts that support this character either. Quote
Lugubert Posted July 30, 2008 at 05:03 PM Report Posted July 30, 2008 at 05:03 PM If you don't need to share the document electronically with others but are satisfied with storing and printing it, the four dragons and lots of interesting non-standard character varieties are included in the Mojikyo fonts. I suppose it would even be possible to make a pdf file from your document, but I haven't tried that yet. Quote
youpii Posted July 30, 2008 at 06:39 PM Report Posted July 30, 2008 at 06:39 PM To get more info about a font file, install the MS Font properties extension: http://www.microsoft.com/typography/truetypeproperty21.mspx You can also install the trial version of FontLab Studio of AsiaFont Studio (saving is crippled but reading & browsing is fine): http://www.fontlab.com/ Last, if you want really to edit Asian fonts, you can install FontForge, but that's not easy: http://fontforge.sourceforge.net/ Very few fonts will have your 4 dragons character because it's a 3/4 bytes character and most of the fonts only covers a part of the 2 bytes range. On my computer, only Simsun Founder extented (sursong.ttf) has it, normal Simsun does not have it. If you want to try some new fonts, type 字体 in google Quote
Mark Yong Posted September 12, 2009 at 07:05 AM Author Report Posted September 12, 2009 at 07:05 AM Reviving a thread that I started, but did not quite get down to the answer I needed... Using the same "4-dragons" example again. I looked it up in the Unihan website as suggested by imron above. It displays the link to the character as a graphic, but when I click the link, where the character should be displayed is just a box with the Unicode for it, i.e. 02A6A5. Now, when I copy-and-paste that 'box' into Google, and do a search on it, I do get results on the "4-dragons" (the first link is to zh.wictionary's definition page of it). Now, I suppose this means that my Windows XP does not have this character in the character set, so it cannot be displayed. So, back to my original question: What can I do in order to be able to get them into my OS's character database, such that I can: 1. View such characters correctly in my browser 2. Generate/type them out I realise I am phrasing the question in rather non-tech and laymen's terms. (BTW, I am using the "4-dragon" character just as an example - I have other far more useful characters - mostly dialect ones - that I want to generate, but cannot.) Quote
imron Posted September 12, 2009 at 07:27 AM Report Posted September 12, 2009 at 07:27 AM What web browser are you using? This is purely a font issue. As long as you have a unicode font that contains the character, then it should display if you select that font (and in fact even if you don't select that font because the OS should perform font-substitution for you). Quote
Mark Yong Posted September 12, 2009 at 08:40 AM Author Report Posted September 12, 2009 at 08:40 AM I am running Mozilla Firefox 3.5.2. Okay, let's say I do have the Unicode font that includes a particular character. Does it mean that if, let's say, I: 1. Open up MS Word 2. Go to Insert > Symbol 3. Select the font (SimSun, PMingLiu, TW-Kai, etc.) 4. Search for the character I want (for simplicity, let's use the "4-dragons" again - so I scroll down to all the characters with the dragon radical, and search for it ... I should then be able to locate it, if that character is in my character set? Using the "4-dragons" case, I searched through all the three font sets I listed above. They had 3 dragons 龘, but not 4. So, what you mean is, I do not have the Unicode font that contains the "4-dragon" character (amongst others). Where I can I download and install them, then? Okay, to put things in context: I am trying to generate the character for Hokkien word bue, which is a fusion of 勿 and 會. I have seen it printed in books before, but have no idea how to generate it in Windows XP, so I wonder how the printers did it. Strangely, neither have I seen it displayed on any website before. Quote
imron Posted September 12, 2009 at 09:03 AM Report Posted September 12, 2009 at 09:03 AM Ok, it turns out I have only one font on my system that contains the four dragons character. The font is called 宋体-方正超大字符集 and the filename is sursong.ttf. A google search for sursong.ttf should provide clues on how to obtain it. Regarding the other characters, how are 勿 and 會 combined (left/right, top/bottom etc). I did a quick search on the unihan page and couldn't find the appropriate character (though there are quite a few others with 勿 as the radical). It's also quite possible that the publishers of books with this in it created their own specialised font to print it. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.