HanziCraft

November 9, 2014 at 06:59 PM

I did a quick search and was surprised not to get any hits on this.

While looking for ways of looking up components of characters I stumbled on http://www.hanzicraft.com . I've thrown in a few characters that I know and so far it seems to be doing well, does anybody have any opinions on the site?

And if it's not reliable or there are better options out there, feel free to let me know. I find that it's a lot easier to learn characters if I spend the time to break them down, and I have yet to find a good way of doing that for cases where I don't already know all the components, meanings and pronunciation where applicable.

November 9, 2014 at 08:05 PM

I threw in 很 and got very little. I would have thought a word like this would have quite a few medium frequency entries but it has only one.

As I am still learning I can not be sure of its accuracy, not that I doubt it but I can't comment on it.

November 9, 2014 at 08:22 PM

That's the main thing that I'm wondering about. I threw some really easy ones at it and it got those ones right, but I'm not sure how well it does with more complicated characters and ones that use less common radicals. I'm not to the point with my writing that I know a lot of characters with really obscure radicals.

November 9, 2014 at 10:52 PM

It's about as good (or bad) as anything else out there right now, to be honest. Which is to say it's not very good. It breaks characters up solely based on the surface-level structure, gets sound components wrong (for instance, the "pronunciation clue" for 黑 is that 灬 has the same initial, which is pretty far from the mark).

You're better off using zhongwen.com, but even that gets a lot of stuff wrong (including 黑, but in a different way).

Oh, even worse: "Pronunciation clue for 堂 (tang2): The component 土 is pronounced as 'tu3'. It has the same pinyin initial." But 土 is not the phonetic component, 尚 is. That ought to be obvious, so that's a pretty big error in my book. At least zhongwen.com gets that right.

So again, it's about like anything else out there right now, including zhongwen.com: put together by someone who doesn't really know much about Chinese characters. An enthusiast, not a specialist.

November 9, 2014 at 11:23 PM

Hmm as I suspected. Good description "An enthusiast, not a specialist."

November 10, 2014 at 12:14 AM

@OneEye, thanks, like I said I've only tried a few relatively easy ones. I'll take a look at that link. I don't need it to be perfect and I doubt that perfection is even possible, but I do hope to find something that works relatively consistently. This sort of thing more or less comes with the territory when it comes to Chinese.

November 10, 2014 at 12:58 AM

Not very useful for anyone. Misleading even.

November 10, 2014 at 03:15 AM

I've been using this site for a while now, so it's a bit dissapointing to hear that it's so unreliable. Most of the time I just use it to find other high frequency uses of whatever character I'm learning, but it sounds like zhongwen.com might be a slightly safer bet from now on. Thanks for the heads up.

November 10, 2014 at 04:02 AM

Mr. John, that's more or less why I asked here. One of the problems with resources like this is that by the time you know if it's a good resource, you may already have developed bad habits and beliefs.

The characters I typed into it mostly looked good, but since I hadn't been able to think of any really hard characters to deconstruct, most programs ought to be able to handle things like 我 without too much trouble. But some of the less common characters might be broken up incorrectly and not have any sort of exception in the database to force the correct behavior.

I think the unfortunate thing is that creating a resource that would cover basically all the characters is a difficult task and as OneEye pointed out needs something more than just enthusiasts for it to be authoritative.

November 10, 2014 at 06:48 AM

I just use it to find other high frequency uses of whatever character I'm learning

I find http://dictionary.writtenchinese.com/ useful for that. entries are generally in order of frequency.

I also use http://www.archchinese.com/chinese_english_dictionary.html, which gives even more rubbish component breakdowns, but makes it easy to see other characters that share whatever component you're interested in.

November 10, 2014 at 10:48 PM

Is http://www.chineseetymology.org more reliable ?

November 11, 2014 at 12:01 AM

Thanks for starting the thread Hedwards. You may have inadvertently saved me a lot of time.

While we're on the topic, for the purpose of breaking down characters (and possibly its etymology), is there a particular dictionary available through PLECO that you guys would recommend? I know there are free resources available online, but I'd prefer to pay if the quality of the product is better.

November 11, 2014 at 12:45 AM

I am sure there must be a good dictionary for the breakdown and etymology of characters for Pleco but I am not familiar with all that are on offer, but I have found WenLin is pretty good for that sort of thing.

Its is not free but i believe there is a demo. Try here http://www.wenlin.com/

It is for windows and I think there is a version for Mac.

November 11, 2014 at 01:09 AM

Thanks for the suggestion Shelley.

I had a quick look at the trial version and it does seem to be what I'm after. Basically, I'm trying to avoid making arbitrary connections between the components just for the sake of memorisation. I know that it's not always possible but I figure that if there is an existing story available for a given character it's preferable to learn the character that way.

I don't think I explained that very well haha...

November 11, 2014 at 01:22 AM

Is http://www.chineseetymology.org more reliable ?

Perhaps a bit better than HanziCraft, but not much. I talked about that a bit here.

He's essentially just scanned a bunch of images from three character form compendia (《甲骨文編》，《金文編》，《六書通》) and added the Shuowen's explanation. He's done a bit more than that, to be fair, but it's really clear that he doesn't have any training. He gets some things right that other sites don't, but there's a lot there that could lead you astray, too.

Interestingly, I just finished answering a question on Quora where the person was lead astray by the "etymology" on that site.

Mr John, your best bet (that is, until we release our dictionary ) is to use zhongwen.com. Pleco can break characters into parts based on the surface-level structure of the character, but it doesn't have anything with accurate character etymologies right now. Wenlin can be helpful too, but it suffers from the same deficiencies as everything else out there, unfortunately.

November 11, 2014 at 07:46 AM

Hi guys,

creator of HanziCraft here! Super happy to see the discussion going! Super glad to see some experts pick apart the site. I'd like to offer some background on the site and perhaps some of the inaccuracies.

For my Masters in Computer Assisted Language Learning, I focused on Spaced Repetition Systems and the influence of radicals while learning them. Here's my thesis if anyone wants to read it: http://scholar.sun.ac.za/bitstream/handle/10019.1/80325/delarouviere_chinese_2012.pdf?sequence=1 . My experiment was a bit of a dismal failure when it came to the spaced repetition systems, but I did end up doing some extensive research into Chinese orthography and learning Chinese characters among foreign/second language learners.

For my research though, I needed to decompose a lot of characters quickly for my research data. So I started work on an open source Node library: https://github.com/nieldlr/Hanzi

This library now forms the backbone of HanziCraft. I use a whole bunch of databases and some of my own ones I created from other databases.

There are definitely some inaccuracies around, as it's based on data I can extrapolate to all characters, although the integrity is really hard to ensure, because as you know, there are a lot of Chinese characters. It's a bit tricky for one person to confirm the accuracy.

However, the example of 堂 is super good one that I need to look into! Thanks OneEye! I see that we're missing a glyph in the data, which seems that I could fill in. This might've thrown off the phonetic clues section.

A note on the phonetic clues section: I've taken a phonetic vs semantic radical agnostic approach to providing these clues. I've tried for a long time to find a database on character & radical classification: for example a character like 洋 is defined as a perfect phono-semantic compound. Where 羊 is on the right and is pronounced exactly the same as the character; it has high phonetic regularity. Then the semantic radical 氵is also highly regular since it contributes a lot to the meaning of the character: water -> ocean.

Although phono-semantic compounds form most of the Chinese characters (81% with differing degrees of regularity), you also have other kind of characters. Such as compound ideograms, which combine two "meanings" to form a single meaning. Like 神, for example. Now the interesting thing is, even though the traditional classification is a compound ideogram, there is definitely a pronunciation clue in there: the component 申 is pronounced shen1. According to traditional radicals classification, no dictionary would provide that clue. A learner could infer it themselves, so I thought I'd write a system which would ignore any of the preconceived classifications and try and help the learner the most. The information provided in the character, regardless of radical/component/classification is a potential way for the learner to better learn the character and form a deeper orthographic knowledge of Chinese characters.

So, I look at clues, so this is what might've happened with 堂 and 土 there. Especially considering that we're missing the 尚 in there! Definitely keen to look into that one. We might have two clues in that character!

Another approach to finding phonetic clues, if we can't find the classification, might be to assume that the right part of the character is the phonetic component/radical. However, I remember from my research that only 72% of phono-semantic compounds are left (semantic) to right (phonetic). They can also be right to left, inclusion (inside each other) among other forms. That's the unfortunate but exciting challenge of Chinese characters, is that they have this tendency towards a certain norm, but then as all linguistic influences go, natural usage doesn't always move towards consistency

The effects of radicals & components have incredibly interesting relationships to how you read & learn Chinese characters. Here's a blog post I wrote two years ago on some of these measures: http://confusedlaowai.com/2012/05/5-ways-chinese-radicals-sub-consciously-trolling/

The regularity measure is the one that I'm interested in for the phonetic clues. I've ignored all previous classifications and purely looked at what is available in the character that could potentially help the learner with the pronunciation. You get different strengths too. I ran that function on a whole database of characters and found a whole bunch of phonetic sets: http://www.hanzicraft.com/lists/phonetic-sets

I'm looking into exploring a lot more of these measures, especially combinality which I find very interesting too, while also trying to ensure accuracy & adding more features! My goal is to provide as much useful information on Chinese characters as possible, even information which might not be ordinarily found in standard dictionaries.

Thanks so much for highlighting some of the inaccuracies. I'm keen to dig into those and see what I can find. It's a bit hard to find databases that are open & ready for use. I got really lucky with Gavin Grover's decomposition db for example. But it's a goal I'm moving towards!

I'd love to answer any questions you guys have! Any ideas or feedback too is super appreciated

November 11, 2014 at 01:04 PM

Interesting stuff. I hope I didn't come across harshly above. The truth is, it's a pretty arcane field, and the only people that really get this stuff right are specialists in the field (though we're trying to bring it out of the ivory tower and make it accessible to everybody).

I like your idea about not paying attention to traditional categorizations of characters. I actually wrote about that recently here. Most scholars don't use 六書 anymore, so you're on the right track. However, it is still absolutely the case that components are either phonetic or semantic (if they aren't corrupted), so I'd recommend not throwing the baby out with the bath water.

You ought to ditch the notion of radicals when talking about etymology, though (here's why).

To fully understand how sound components work, you really need to do some heavy duty analysis of Old Chinese reconstructed pronunciation. You can't simply rely on the degree of similarity in modern Mandarin (which is the least conservative of all Sinitic languages anyway). We're actually working on a series of simple formulas that can be used to explain how sound components work. Preliminary testing indicates that students learning characters using these formulas will have a much more accurate understanding of how sound works in characters than the average native speaker (meaning: much more able to predict the possible pronunciations of unfamiliar characters, and much more likely to be able to tell which component in any given character is the sound component). But at any rate, you have to look at the Old Chinese pronunciation and track how it has changed over time.

Same deal with character meanings and meaning components. You have to look at the original meaning, not the modern meaning. The only meaning directly related to the form of a character is its original meaning. You also have to determine whether the meaning component is expressing meaning via meaning or via form. It's nearly always the latter. See the radicals post linked above for more info. We also have another post coming up (tomorrow, hopefully) which will go into the distinction a bit more.

As with all things to do with Chinese characters, you're not going to find the whole story by looking at the surface structure. You have to look into the deep structure: the ancient forms, meanings, and pronunciations. I wrote a post a while back on resources for learning about character etymology, if you're interested. You seem keen to learn about this stuff, and you're on the right track in a lot of ways, so maybe that post will be useful for you. And feel free to get in touch any time.

November 12, 2014 at 12:25 AM

It was cool to hear the angle you're coming at this from Confused Laowai. Based on your explaination, I think I'll keep using Hanzicraft in conjunction with the other tools out there, while keeping in mind that none of them are perfect.

I read through two of the articles you linked OneEye. Being a bit of a novice, I don't think I understood perfectly, but I take the general point you made about difficulties relating to how to classify characters and their components and the different roles that components can play. I'll definitely be thinking a lot more carefully about your "functional components" approach from now on.

Any idea when we might expect the finished product? I know there are no silver bullets with regard to learning hanzi, but a more logical approach certainly wouldn't hurt.

November 12, 2014 at 12:45 AM

@OneEye, I look forward to seeing that.

November 12, 2014 at 02:31 AM

Well, the dictionary itself will take a while to do. We're too early in the game right now to give a date, but it will likely be at least another year.

We've been brainstorming about other useful things we could put out there in the meantime, though. I personally like the idea of a MOOC on how to learn characters, with a thorough explanation of the "Outlier System" of understanding characters and practical ways of applying it while waiting for the dictionary to be released, but I don't know if we'll end up doing that or not (anyone interested?). We've also talked about posters with the most common meaning and sound components, a workbook, stuff like that. None of it is set in stone, but the point is that we're trying to think of ways to release useful stuff in the interim between now and when we finish the dictionary.

If there's something on the blog that you don't understand, just ask! Either post a comment on the post itself or in the Outlier thread here.

Sign In

HanziCraft

Recommended Posts

hedwards

Shelley

hedwards

OneEye

Shelley

hedwards

Hofmann

Mr John

hedwards

li3wei1

耳耳语语

Mr John

Shelley

Mr John

OneEye

Confused Laowai

OneEye

Mr John

hedwards

OneEye

Join the conversation