Inconsistencies between fonts

March 13, 2009 at 04:54 AM

I'm just curious to know why this particular character (among others) is displayed differently in different fonts:

Is either of the two accepted as the standard?

March 13, 2009 at 08:37 AM

The character you have highlighted is unicode 689d. The computer stores the character internally as 689d. When you view the character on the screen you effectively ask the computer to render or draw 689d on the screen according to the font design. The unicodes of all symbols can be viewed at www.unicode.org.

MS windows provides a character viewing tools tool accessable via start->program files->accessories->system tools->character map.

The Chinese characters start at 4e00 and go thru to 9faf.

A really interesting unciode is 7f6e for the character 置 which looks really different in the Arial Unicode MS from the other fonts. When writing 7f6e one would never make it look like the Arial Unicode MS rendering.

The rendering on the left looks of your posting like the American Microsoft Arial Unicode MS font. The one on the right looks like Simsun. There are many variations of drawing characters, IMHO Simsun is a good font for Chinese as you can view both traditional and simplified characters and it is accurate.

If the reader can recognise the character then it is OK. So I guess both of the renderings or fonts in your posting are acceptable.

But beware a few of the Arial Unicode MS renderings of Chinese characters are incorrect.

Edited March 14, 2009 at 10:00 AM by mph

March 13, 2009 at 10:10 AM

Thanks, yeah I noticed the Arial one's had a few mistakes, like 直 being rendered in what I presume is it's Japanese variant (vertical line down the side).

March 13, 2009 at 12:30 PM

Displaying the Japanese version of zhí is not Arial Unicode MS's mistake. Arial Unicode contains three variants of 直, one each for Japanese, Chinese and Korean. It's up to the word processor / IME to pick the right variant to render.

Unfortunately most word processors can't handle variant glyphs and simply display the default one. Arial Unicode's default for 直 is the Japanese version, so that's the one which usually gets selected.

If you use InDesign, for example, you can select which variant of Arial Unicode's 直 (or 置) should be rendered.

March 14, 2009 at 09:54 AM

Displaying the Japanese version of zhí is not Arial Unicode MS's mistake.

The unicode standard has a picture of what the symbol should look like on the www.unicode.org site. The symbol for for 76f4 according to unicode standard should look like 直. This is the world wide accepted standard and is independent of any font or Japanese/Chinese/simplified variant whatever. The Arial Unicode MS rendering for symbol 76f4 does not look like 直 so it does not conform to the unicode standard. It conforms to the Microsoft standard I guess.

Another problem is that if one uses the Microsoft Chinese Pinyin Input Method for inputting zhi the user chooses the 直 symbol in the input method interface and ends up with some other symbol on the screen that does not look like 直. The symbol one ends up with is the Jap variant for which I believe there is no unicode.

So the Chinese IME provided by MS is not consistent with itself.

March 14, 2009 at 11:23 AM

Sorry mph, but that information is wrong.

The shapes of glyphs shown on www.unicode.com are "non-normative". They are only examples, not strictly part of the standard. It says so in the standard (from The Unicode Standard, Version 5.1):

CJK Unified Ideographs. The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts.

More importantly, the Unicode standard does not assign separate codepoints for each CJK shape variation (with some exceptions.) That's because Unicode is a standard for "abstract characters", not glyphs. Each codepoint must potentially be shared between the character's Simplified, Traditional, Japanese and Korean versions.

For zhí meaning "straight; vertical; frank; directly; straightly; upright", there is only one codepoint (76F4). To distinguish between different glyphs, Unicode provides a mechanism called "variant selectors". The choice of which glyph version should be "default" and which others should be "variants" is out of the scope of the standard (and subject to international politics.)

So from Unicode's perspective, the Japanese version of zhí is a perfectly valid default representation for codepoint 76F4. It's up to the word processor / IME / OS / end-user to select the right variant. Unfortunately, since the variant selector mechanism was not part of the original Unicode Standard, many systems today are not able to use them, always rendering the default glyph.

That's not Arial Unicode's fault, however. This font dutifully encodes all three versions of zhí as glyph variants, as required by the standard.

For more information, see also Han unification (Unihan).

March 14, 2009 at 02:15 PM

I think the pictures in the OP are incomplete. They should look like this ->

March 14, 2009 at 05:24 PM

March 15, 2009 at 10:45 AM

I am sorry but I cannot agree with you. We should be academics. This could go on forever.

The 76f4 example in the standard has a picture of the Chinese character not the Kanji variant. The rendering of 76f4 by Arial Unicode MS is in my opinion beyond the limits of considerable variation quoted in the standard.

The Arial MS rendering is not recognisable by many and that is in my opinion ia mistake.

The standard has codepoints for the Jap and Chinese variants and in all other cases the Arial MS rendering looks like the example in the standard.

Some Jp and Cn examples

歩 6b69 and 步 6b65.

様 69d8 and 樣 6a23

歯 6b6f and 齒 9f52

氷 6c37 and 冰 51b0

乗 4e57 and 乘 4e58

In each case the Arial MS rendering looks like the example in the standard.

Why use the Jp variant in the Cn codepoint only for 76f4?

Simsun also has some renderings that in my opinion don't look like the example.

災熒 and 熒 should have a complete 火 underneath.

My Chinese teachers made us write them 100 times for homework for such serious indiscretions. God only knows what the punishment for writing a Jp variant would have been half a century ago, but it would have been severe. Maybe the 枷 would have made a comeback.

March 15, 2009 at 11:04 AM

Again, read: http://en.wikipedia.org/wiki/Han_unification

It already explains all the questions you have on why some variants have their own glyphs while others do not, etc.

Arial Unicode MS is a font. Fonts don't render anything. It's up to word processors, browsers, operating systems, etc., to pick the correct rendering.

I've already explained several times that Arial Unicode MS in fact contains all the CJK variants of zhí.

See here: http://www.bgaertner.gmxhome.de/OTVDetails.htm

Scroll to the bottom and look at the last picture. You can see for yourself that Arial Unicode MS encodes the Chinese variant of zhí correctly. If your word processor or browser chooses the wrong variant, it's not the font's problem.

Sign In

Inconsistencies between fonts

Recommended Posts

ChristopherB

mph

ChristopherB

peekay

mph

peekay

skylee

imron

mph

peekay

Join the conversation