Jump to content
Chinese-Forums
  • Sign Up

Characters Break up - Vector Fonts?


Recommended Posts

Posted

Hello to all,

I wonder if anyone has come across a similar problem and has a solution.

I am developing an application to practice Chinese characters and would like to draw in different colors the varsious components of each character. For example for the word 好, I would like to have 'woman' in, say, green and 'child' in red.

I already have stored each sub-component for this character in an excel sheet. So, in the row that contains 好,I have two more columns, one for woman and one for child. this part is already taken care of.

The attachment shows a good example of what I am trying to do for the word Love, taken from chinese Bible.

The problem is I don't know how to re-construct the 好 character, starting from the two sub-components. Yes, it is of the type left-right, but cannot just place them like this, as for most characters I would destroy the look of the character itself.

Someone has suggested that I need to use Vector fonts..I think some of my fonts installed are Vector type, but I don't know what to do with it. I also fear that the Vector font breakdown goes deeper than just separating woman and child, but almost draw stroke by stroke, which I don't want. Maybe that's not the right approach.

Anyone can suggest how to go about it?

Thanks

2792_thumb.attach

Posted

You're correct that vector fonts by themselves are unlikely to solve the problem; some vector fonts are indeed built up out of recyclable components (makes them a lot smaller, since the font only needs to store one copy of the outline for 女 etc and then just reference that outline in each character that uses it), but it's very tricky to actually delve down and access those components individually - have to play with a lot of arcane TrueType coding weirdness - and a lot of the components aren't chosen in a linguistically-useful way. (might be three or four different versions of 女 depending on the size / where they're going in the character) There's also the problem of components being overlapped / merged and impossible to separate.

One possibility - though it's kind of a legal gray area - would be to find a program that exports every character in a font to a friendly outline format - SVG is probably best - and then manually go in and tag different parts of that outline based on their components. You'd store those SVG outlines in some sort of database, then to recall the character your software would load the outline from the database, render the part tagged with one component in one color and with another component in another color. You might have to manually split some components that overlapped / merged, but that would at least be a way for you to extract those outlines and be able to manage / render them separately. The tagging would be a lot of work, so probably only something you'd want to do for a thousand or so common characters at the most - maybe pick all of the HSK Level A ones, say.

As I said, though, this is legally problematic because those fonts are copyrighted by the font author - the legal copyrightability of outline fonts rests mainly on the fact that TrueType / PostScript font files contain something resembling computer code, so exporting them to simple vector images may mean they're no longer copyrighted, but it would still be a bad idea to build a product on this without permission from the font author.

There are some open-source fonts out there you might be able to do this with, though, depending on how you're planning to distribute your application - the Arphic TrueType fonts are licensed according to something sort of GPL-like (Google around and you'll find lots of links to them), though they don't seem to use recycled components, and there's also Google's Droid Sans Fallback font, which I think is Apache-licensed and does have reusable components but isn't all that attractive and uses kind of a flat / linear style rather than something calligraphic like in your sample image. The Apache license would be much better than GPL if you want to keep your software proprietary, since IIRC it doesn't require you to distribute your modifications / derivative works and hence you could keep your laboriously-tagged outlines to yourself.

One other alternative would be to find someone else who already has component-separated character outline data and simply license it; search around for stroke order teaching software and the like and see where they got their data from. Or, again depending on how you plan to distribute your software, you could look for an open-source Chinese product with component-separated character images / outlines and use that.

As far as making use of the breakdown data you already have to assemble characters, that might be doable but you'd have to lay out the bounding boxes of the different parts of the character by hand - render a 女 squeezed in a rectangular box on the left side and a 子 on the right. And of course you'd still need to get the 女/子 outlines from somewhere. The bounding boxes would be fairly predictable for characters of the same shape, though, so at least for easily-divisible characters like 好 you'd just say that 女 had your standard left-half-of-a-character bounds and 子 your standard right-half ones, but it'd be considerably hairier in less orderly characters like 幾, and the results wouldn't necessarily look that good either (strokes might look weirdly narrow / wide in some places if they were scaled).

Posted

mikelove,

thanks a lot for your detailed answer..it looks like you know the problem very well. thanks.

Tagging manually each character is way to complicate and it takes too long. I will try to find already existing component-separated character outline data and ask them

one of them seems to be http://www.globechinese.com, but cannot find any contact details for them

I will keep searching

thanks

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...