Jump to content
Chinese-Forums
  • Sign Up

New Chinese Learning Tool: Hanzi Data


Recommended Posts

Posted

Hi all,

I'm quite a new learner of Mandarin, but in my spare time I've developed a tool that I've found really helpful in my development. It's called Hanzi Data (www.hanzidata.com) and it is a Chinese-English dictionary that:

  • Gives you all the different pinyin pronunciations of the character;
  • Gives you the (often multiple) definitions that are associated with each pronunciation;
  • Decomposes the character into its components -- this is particularly good for learning the meaning of the character (plus it's really interesting nonetheless); and
  • Gives you a list of words of which the character is a constituent, and the relative frequencies of those words -- I find that this is handy to give you a context of how the word is used and how it relates to synonyms/related words.

Again, you can find the tool at www.hanzidata.com. I'd be really appreciative if people could give me their opinion of the tool: is it helpful? is it intuitive? what should be changed? what improvements could I make?

Going forward I'd like to include more content under the Hanzi Data name. Things like youtube lessons, more resources, etc. If anybody would like to contribute please let me know.

Happy studying!

Dan

  • Like 1
Posted

I really like the look of this, its not full of loads of confusing things and I like the colour scheme again not hectic. The info looks good, clear and useful.

One question, how do you input a character? Can you input pinyin?, if not I think you should be able to.

Overall it looks like a good start for this kind of thing, keep up the good work :)

Posted

@Shelley: this user interface is so slick, you don't see the input boxes. click on the large character in the center or into the small one in the upper right and insert your own character there. it's really a search form.

Posted

@flow, thanks. I had worked out where to enter a character (not the little one though) but after thinking about it, I worked how to actually input a character, with MS IME duh!

I just got carried away with how nice it looked and how well it worked, I didn't stop to think:)

I also tried it on my tablet and it worked, but it obviously isn't optimised for mobile devices. It would be good if it was, as it would be really good to have this available on mobile devices.

I also read the blog and that answers your question about where the data comes from.

Posted

I like it. Agree that it'd be nice to make it mobile compatible - the frequency/word data doesn't appear on an iPhone.

Posted

Thanks for the feedback everybody. I'm definitely going to start building some mobile sites for Hanzi Data, I agree that it would be handy to have on my iPhone (speaking of which, anybody experienced with building apps?). I'm also going to integrate a pinyin input (I've actually made this already but didn't include it for various reasons) and, potentially, a handwriting input for touchscreen devices.

More importantly, I am looking to build the content of the website by hosting some Chinese language blogs. I really think it would be mutually beneficial to have more content on one website, not only does it potentially increase web traffic but also 'time on site'. If anybody is interested in moving their blog over or writing one please let me know (my Mandarin isn't good enough to write one). We could start by having an informal conversation about what it would entail. I think I could bring a lot to the table.

Cheers,

Daniel

  • 2 weeks later...
  • New Members
Posted

Cool site - like everyone else I really like the clean interface.

Two comments: Firstly your text box lets you type in more than one character and then it says "Character not found" (eg: http://www.hanzidata.com/character.php?character=%E7%9F%A5%E9%81%93 ). Might want to limit it to only allow 1 character in the box or have some better error when it fails so that the user knows why.

Secondly - 5th/neutral tone has a number while the rest use tone makes (eg: http://www.hanzidata.com/character.php?character=%E7%9D%80 ). Is that intentional?

Posted

Thanks kaysik. I've fixed the two errors that you mentioned. It's great to have this feedback so I can polish it up.

My next question is: What tools would you like next? Something like a lookup in which you input characters and it shows you all the words that include those characters? A character to pinyin translation tool? Something else? 8) 8)

Posted

I don't know if a character to pinyin tool would be much use as you have to know the pinyin to enter a character. I think one of the most useful things would be able to input handwritten characters (using the mouse or pen and tablet), or cut and paste, and then find out the pinyin and all it's meanings radicals etc.

A tool that shows all the words that include a particular character would be good but also Chinese sayings, common phrases and names of famous places or people would be good.

This all sounds like a huge request, but it will keep you occupied :)

I am no good at programming but if there is anything I can do, check data or something or be a beta tester or something else you can think of, i am prepared to offer my time but on the understanding that this will be just one of many things I am doing and can not be a full time thing or take priority.

Posted

Thank you Shelley, I would love your help beta testing a new tool that I have my eye on. This one will be super helpful and I'm looking forward to the challenge of coding it. Unfortunately, I can't access your profile because I have too few posts. Maybe you could email me at hello@hanzidata.com so we can get in touch? I am looking for other beta testers too, so if anybody else would like to help our please email that address.

Keep the ideas coming too. Also, if anybody has any ideas about getting blogs to review the tools or other ways to drive traffic to Hanzi Data I am all ears! :)

Posted

Dan, that's some neat tool you've put together, I've just send a mass-message to all my classmates on QQ to recommend it. Thanks for your efforts! :P

  • 3 weeks later...
Posted

Following my previous post about the Hanzi Data Character Dictionary and Analysis Tool, I'd like to introduce the Hanzi Data Character Intonation Tool.

This new tool provided an additional step between learning pinyin and learning Chinese characters. That is, learning Chinese characters with tone marks on them. You can practice your tone memory by switching the tones on or off, then reading the Chinese characters.

The Character Intonation Tool is linked to the Dictionary and Analysis Tool. It uses data from the Dictionary and Analysis Tool to determine which is the most appropriate tone mark to put on a character in each context. Of course, it's not perfect, but if you do find a mistake you can click on the character which will take you to the Dictionary and Analysis Tool, then click +1 Most Common. That will tell the Intonation Tool the appropriate tone.

This is essentially a pre-release. I'd love to hear what forum users think of the tool, as well as potential ways to improve it.

Thanks again,

Dan

Update -- Following some great feedback from this forum we've made some updates to the tool. It is much, much more accurate now. Feel free to try it out! :)

Posted
Doesn't observe tone sandhi.

No it doesnt, I tried to leave it consistent with other resources. I know that my textbooks don't observe tone sandhi when they write sentences in pinyin.

It's something I could definitely add though!

Posted

Suggestions: make it work with traditional characters, make it possible to select (and copy/paste) the results, remove those annoying spaces between the characters (or whatever they are, I didn't check the source code) so I can use Perapera with it.

Otherwise pretty good.

Posted
I know that my textbooks don't observe tone sandhi when they write sentences in pinyin.

That's because it is incorrect to observe tone sandhi when writing pinyin.

  • Like 1
Posted

Also, if you're doing Chinese text -> tone conversions you're about to discover a whole world of hurt related to getting the correct tones.

e.g. in the sample text:

那 should be nà not nǎ

人数 should be rén shù not rén shǔ

家长 should be jiāzhǎng but you have it as jiācháng.

And that's just the first few I spotted in the first couple of sentences.

Making it accurate is no small task and making it completely accurate across all valid inputs is basically impossible because you need to be able to parse and understand the meaning of the sentence. Simply allowing users to say which pronunciation is most common is not enough, because the correct choice usually depends on the surrounding characters, rather than the frequency of use (btw if you can solve this problem correctly, you stand to make millions from Google and any other company doing automated/machine translation).

So you then have to start making tradeoffs between accuracy and other things (ease of implementation, parsing speed, etc), but without a reasonable degree of accuracy the value of such a tool is diminished.

As it stands currently, it looks well presented and is a good first effort, however I would say that the accuracy needs to improve before it can be considered a useful tool. Mostly because it's the sort of tool that is useful to beginners who haven't yet got a good grasp of the tones, and this group is the group least likely to be able to spot mistakes.

Posted

Oh yes, I see what you're saying there. I thought that it was a problem with uncommon pinyin being called over common pinyin. I have actually developed a method that is much more accurate but it takes a very long time to process (as in, come back in an hour slow). Hmm, I'll have to try and reduce the load time.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...