GaoJinJie Posted April 28, 2013 at 12:25 PM Report Posted April 28, 2013 at 12:25 PM Hi all, I'm quite a new learner of Mandarin, but in my spare time I've developed a tool that I've found really helpful in my development. It's called Hanzi Data (www.hanzidata.com) and it is a Chinese-English dictionary that: Gives you all the different pinyin pronunciations of the character; Gives you the (often multiple) definitions that are associated with each pronunciation; Decomposes the character into its components -- this is particularly good for learning the meaning of the character (plus it's really interesting nonetheless); and Gives you a list of words of which the character is a constituent, and the relative frequencies of those words -- I find that this is handy to give you a context of how the word is used and how it relates to synonyms/related words. Again, you can find the tool at www.hanzidata.com. I'd be really appreciative if people could give me their opinion of the tool: is it helpful? is it intuitive? what should be changed? what improvements could I make? Going forward I'd like to include more content under the Hanzi Data name. Things like youtube lessons, more resources, etc. If anybody would like to contribute please let me know. Happy studying! Dan 1 Quote
flow Posted April 28, 2013 at 01:45 PM Report Posted April 28, 2013 at 01:45 PM congrats, looks very interesting! where does your structural data come from? Quote
Shelley Posted April 28, 2013 at 01:50 PM Report Posted April 28, 2013 at 01:50 PM I really like the look of this, its not full of loads of confusing things and I like the colour scheme again not hectic. The info looks good, clear and useful. One question, how do you input a character? Can you input pinyin?, if not I think you should be able to. Overall it looks like a good start for this kind of thing, keep up the good work Quote
Frapunchino Posted April 28, 2013 at 03:39 PM Report Posted April 28, 2013 at 03:39 PM thanks for sharing, looks wonderfully uncluttered and focused Quote
flow Posted April 28, 2013 at 03:43 PM Report Posted April 28, 2013 at 03:43 PM @Shelley: this user interface is so slick, you don't see the input boxes. click on the large character in the center or into the small one in the upper right and insert your own character there. it's really a search form. Quote
Shelley Posted April 28, 2013 at 04:55 PM Report Posted April 28, 2013 at 04:55 PM @flow, thanks. I had worked out where to enter a character (not the little one though) but after thinking about it, I worked how to actually input a character, with MS IME duh! I just got carried away with how nice it looked and how well it worked, I didn't stop to think:) I also tried it on my tablet and it worked, but it obviously isn't optimised for mobile devices. It would be good if it was, as it would be really good to have this available on mobile devices. I also read the blog and that answers your question about where the data comes from. Quote
icebear Posted April 29, 2013 at 03:55 AM Report Posted April 29, 2013 at 03:55 AM I like it. Agree that it'd be nice to make it mobile compatible - the frequency/word data doesn't appear on an iPhone. Quote
GaoJinJie Posted April 30, 2013 at 11:40 AM Author Report Posted April 30, 2013 at 11:40 AM Thanks for the feedback everybody. I'm definitely going to start building some mobile sites for Hanzi Data, I agree that it would be handy to have on my iPhone (speaking of which, anybody experienced with building apps?). I'm also going to integrate a pinyin input (I've actually made this already but didn't include it for various reasons) and, potentially, a handwriting input for touchscreen devices. More importantly, I am looking to build the content of the website by hosting some Chinese language blogs. I really think it would be mutually beneficial to have more content on one website, not only does it potentially increase web traffic but also 'time on site'. If anybody is interested in moving their blog over or writing one please let me know (my Mandarin isn't good enough to write one). We could start by having an informal conversation about what it would entail. I think I could bring a lot to the table. Cheers, Daniel Quote
New Members kaysik Posted May 11, 2013 at 03:16 PM New Members Report Posted May 11, 2013 at 03:16 PM Cool site - like everyone else I really like the clean interface. Two comments: Firstly your text box lets you type in more than one character and then it says "Character not found" (eg: http://www.hanzidata.com/character.php?character=%E7%9F%A5%E9%81%93 ). Might want to limit it to only allow 1 character in the box or have some better error when it fails so that the user knows why. Secondly - 5th/neutral tone has a number while the rest use tone makes (eg: http://www.hanzidata.com/character.php?character=%E7%9D%80 ). Is that intentional? Quote
GaoJinJie Posted May 12, 2013 at 12:28 AM Author Report Posted May 12, 2013 at 12:28 AM Thanks kaysik. I've fixed the two errors that you mentioned. It's great to have this feedback so I can polish it up. My next question is: What tools would you like next? Something like a lookup in which you input characters and it shows you all the words that include those characters? A character to pinyin translation tool? Something else? 8) Quote
Shelley Posted May 12, 2013 at 08:09 PM Report Posted May 12, 2013 at 08:09 PM I don't know if a character to pinyin tool would be much use as you have to know the pinyin to enter a character. I think one of the most useful things would be able to input handwritten characters (using the mouse or pen and tablet), or cut and paste, and then find out the pinyin and all it's meanings radicals etc. A tool that shows all the words that include a particular character would be good but also Chinese sayings, common phrases and names of famous places or people would be good. This all sounds like a huge request, but it will keep you occupied I am no good at programming but if there is anything I can do, check data or something or be a beta tester or something else you can think of, i am prepared to offer my time but on the understanding that this will be just one of many things I am doing and can not be a full time thing or take priority. Quote
GaoJinJie Posted May 13, 2013 at 11:08 AM Author Report Posted May 13, 2013 at 11:08 AM Thank you Shelley, I would love your help beta testing a new tool that I have my eye on. This one will be super helpful and I'm looking forward to the challenge of coding it. Unfortunately, I can't access your profile because I have too few posts. Maybe you could email me at hello@hanzidata.com so we can get in touch? I am looking for other beta testers too, so if anybody else would like to help our please email that address. Keep the ideas coming too. Also, if anybody has any ideas about getting blogs to review the tools or other ways to drive traffic to Hanzi Data I am all ears! Quote
Manuel Posted May 15, 2013 at 05:42 AM Report Posted May 15, 2013 at 05:42 AM Dan, that's some neat tool you've put together, I've just send a mass-message to all my classmates on QQ to recommend it. Thanks for your efforts! Quote
GaoJinJie Posted June 3, 2013 at 02:54 AM Author Report Posted June 3, 2013 at 02:54 AM Following my previous post about the Hanzi Data Character Dictionary and Analysis Tool, I'd like to introduce the Hanzi Data Character Intonation Tool. This new tool provided an additional step between learning pinyin and learning Chinese characters. That is, learning Chinese characters with tone marks on them. You can practice your tone memory by switching the tones on or off, then reading the Chinese characters. The Character Intonation Tool is linked to the Dictionary and Analysis Tool. It uses data from the Dictionary and Analysis Tool to determine which is the most appropriate tone mark to put on a character in each context. Of course, it's not perfect, but if you do find a mistake you can click on the character which will take you to the Dictionary and Analysis Tool, then click +1 Most Common. That will tell the Intonation Tool the appropriate tone. This is essentially a pre-release. I'd love to hear what forum users think of the tool, as well as potential ways to improve it. Thanks again, Dan Update -- Following some great feedback from this forum we've made some updates to the tool. It is much, much more accurate now. Feel free to try it out! Quote
drencrom Posted June 3, 2013 at 04:19 AM Report Posted June 3, 2013 at 04:19 AM Doesn't observe tone sandhi. Quote
GaoJinJie Posted June 3, 2013 at 04:30 AM Author Report Posted June 3, 2013 at 04:30 AM Doesn't observe tone sandhi. No it doesnt, I tried to leave it consistent with other resources. I know that my textbooks don't observe tone sandhi when they write sentences in pinyin. It's something I could definitely add though! Quote
Koxinga Posted June 3, 2013 at 12:06 PM Report Posted June 3, 2013 at 12:06 PM Suggestions: make it work with traditional characters, make it possible to select (and copy/paste) the results, remove those annoying spaces between the characters (or whatever they are, I didn't check the source code) so I can use Perapera with it. Otherwise pretty good. Quote
imron Posted June 3, 2013 at 01:05 PM Report Posted June 3, 2013 at 01:05 PM I know that my textbooks don't observe tone sandhi when they write sentences in pinyin. That's because it is incorrect to observe tone sandhi when writing pinyin. 1 Quote
imron Posted June 3, 2013 at 01:29 PM Report Posted June 3, 2013 at 01:29 PM Also, if you're doing Chinese text -> tone conversions you're about to discover a whole world of hurt related to getting the correct tones. e.g. in the sample text: 那 should be nà not nǎ 人数 should be rén shù not rén shǔ 家长 should be jiāzhǎng but you have it as jiācháng. And that's just the first few I spotted in the first couple of sentences. Making it accurate is no small task and making it completely accurate across all valid inputs is basically impossible because you need to be able to parse and understand the meaning of the sentence. Simply allowing users to say which pronunciation is most common is not enough, because the correct choice usually depends on the surrounding characters, rather than the frequency of use (btw if you can solve this problem correctly, you stand to make millions from Google and any other company doing automated/machine translation). So you then have to start making tradeoffs between accuracy and other things (ease of implementation, parsing speed, etc), but without a reasonable degree of accuracy the value of such a tool is diminished. As it stands currently, it looks well presented and is a good first effort, however I would say that the accuracy needs to improve before it can be considered a useful tool. Mostly because it's the sort of tool that is useful to beginners who haven't yet got a good grasp of the tones, and this group is the group least likely to be able to spot mistakes. Quote
GaoJinJie Posted June 3, 2013 at 01:51 PM Author Report Posted June 3, 2013 at 01:51 PM Oh yes, I see what you're saying there. I thought that it was a problem with uncommon pinyin being called over common pinyin. I have actually developed a method that is much more accurate but it takes a very long time to process (as in, come back in an hour slow). Hmm, I'll have to try and reduce the load time. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.