Jump to content
Chinese-Forums
  • Sign Up

Automatically Extract Radical from Character


Recommended Posts

Posted

Hi all

after reading a few previous posts it seems there is some value in learning, or at least recognizing the radical in a character.

Rather than manually going through each character in my word list and checking PLECO or something to find the corresponding radical, is there anyway I can do this automatically using excel (excel vba function) or some online facility.

So for purposes of clarity, I wonder can I extract ( / xíng / walk …) out of ( / jiē / street) easily.

Thanks

Incidentally - 2 leaning questions

1. is it common for radicals to be separated like this, i,e. in my example above seems to be separated and a component slotted into the middle of the radical

2. the character for 'four' sì appears to have the radical , wéi and the component er. However both and are also radicals. How would I know which is the actual radical and which part is the component.?

Posted

just bouncing ideas about but maybe downloading a dictionary and using excel and vlookup() to extract the radical (or measure word for that matter) from the character

I downloaded the CC-CEDICT but doesn't list the radicals. Useful for extracting measure words though

Anybody know of a dictionary that can I download that contains radical parts of a character?

Posted

IMHO your 2 questions just about sums up the reasons that doing this automatically is going to be difficult.

Also doing it manually will make it stick in your brain if you are doing this to learn the radicals and which part of the character is the radical.

As mikelove has shown the information is out there.

Posted

21lk93n.jpg

You might get a copy of the formerly shareware, now freeware CQuickTrans.

http://www.coolest.com

It should do the trick for you.

All you have to do is enter some Chinese text into the input area. Hit return and it'll return the Unihan data for all the characters in the Hanzi info section with the CEDICT information in the Dictionary section.

It's no longer being developed and it is rather long in the tooth.

It uses an early edition of the CEDICT. A very early edition. But this might be a blessing for a beginner like you. No bloat now that they're trying to include every Chinese word in the newer editions. You could always download a newer CEDICT if you want, I think it's now called CC-DICT and format it for CQuickTrans. Their dictionaries are just text files. Once you run the program it'll automatically convert the text file into a dictionary file.

There are also a few discrepancies between Unihan and some dictionaries as to which is the actual radical for a few characters, but, too minor to worry about.

Also, as you can see in the image above they've got the "standing man" radical, I think that's what they call it, instead of the regular 人 radical and sometimes it'll be the simplified version of some radicals.

The Unihan they use is also an old version. Again not a biggie.

And it doesn't handle the Unicode extensions. Unless you're researching rare character variants you wouldn't miss it.

It's the only Chinese related software I ever paid for.

Kobo.

Posted
From above:

There are also a few discrepancies between Unihan and some dictionaries as to which is the actual radical for a few characters, but, too minor to worry about.

A "for instance".

If I were to look up the character 初...

The Guoyu Cidian put out by the Ministry of Education on Taiwan has it under the clothes, 衣(衤), radical.

http://dict.revised....x=100&imgFont=1

But if I look for the character at the Lin Yutang Chinese-English Dictionary of Modern Usage site housed at the Chinese University of Hong Kong, the character is under the 刀(刂)

http://mario.arts.cu...ical/rad18.html

The Unihan, which gets its radical data from the Kangxi Zidian, I think, has it under the knife radical as well.

2e4kfvs.jpg

Note how they have the knife radical in the image from the CQuickTrans result.

111nn2e.jpg

The Xiandai Hanyu Cidian has 衣(衤) as radical.

The Kangxi Zidian has 刀(刂) as radical as the ZDIC web site which probably also gets its radical information from Kangxi via Unihan.

What does this all mean?

It means Kobo wants to pull his hair out by the roots. Argggh!!!! :)

Kobo.

Posted

I'm not sure if it's really necessary to remember which character is which radical in the modern world. Not to mention that the radicals are often illogical, e.g. the radical for 哿 is 口.

You either look up a character via pinyin/bopomofo (this is what modern Chinese dictionaries use) or just draw it on your smartphone's screen if you don't know the pronunciation (this is actually what the Chinese do when they come across a rare character, e.g. in a botanical garden).

Posted

淨土極樂: I just want to see the radical for a character, rather that actual learn the radical. I was think that in ANKI when choosing a C->E flash card, it might be nice to see the radical in the answer section of the flash card. The impetus of this is to try improve my ability to learn HANZI characters and words as I am struggling at the best of times. Perhaps this might aid my recognition of the character. Perhaps it does nothing to help my recognition! Not sure to be honest

Shelley: I know what you mean, but I find when I manually do repetitive tasks like this I have a tenancy to switch off and mind wanders so time v's learning is inefficient for me. I have a 1000 characters so 1000 CTRL+C / CTRL V will drive me crazy :shock:

Mike: thanks for the link. Unfortunately I can;t seem to open the database without my computer freezing up or just topo big to import into Excel

Kobo: Thanks for the link. Will have a look and let you know. As to your comments about the age and size of a databse, I think its not a problem for me at my level :lol:

Posted

I don't think there's any value in a systematic effort to learn the radical for each character. Learn the components as you come across them, which happens quite naturally if you're curious. For example, you come across 中心 and looking it up find that 心 is heart. Later you see the character 想 and you notice 心 is in there, and then a bit later you see 相 and then you look up 木 + 目 to see what they mean, and so on and so forth. Combine that with an awareness of the rules of thumb for radical placement and you'll be able to have a pretty reliable guess on what the radical is any time you see a new character.

Adding radicals to flashcards seems like overkill to me. It also implies you have characters on your flashcards, rather than words. I'm not sure how many people would recommend that.

Posted

I have the list of the most common radicals printed out and in my folder where I also do exercises. I didn't slavishly learn it, of course, but I read it once, and most of them I just "happen" to learn subconsciously because I see them all the time.

If you don't know the radicals and components, how do you discuss characters? Or remember them? For example, when we talk on the phone and my friend asks, "which Chen do you mean?", I say, "the one written ear and east", and she knows exactly what I mean.

Or, a while ago, someone posted photographs of the descriptions of different silk types here. What are you supposed to do in such case when there is no electronic material available and no pinyin? I grabbed my dictionary and looked up the characters by radical. If you are familiar with that, it is just as fast as anything else.

Of course there is always the odd character where the radical is the dot or stroke you'd least expect. But that's the exception to the rule, from my experience.

Posted
"which Chen do you mean?", I say, "the one written ear and east"

Except that's not 'ear' but 阜.

But then again, if you try to explain the character like this, very few people will understand. See, learning the real radicals is indeed not very useful in the modern world.

Posted

On the opposite, it is so useful that you instantly knew what I meant, even though I made a mistake! :wink:

Posted

Thanks guys,

yes perhaps your right Roddy! Might be overkill as you say.

I do seem to have a big problem recognising individual characters in a word, which probably demonstrates that I haven’t recognise them to a good enough recognition level. For example, I finished the HSK3 word list a few months now (i.e. no more new cards in ANKI) and I have about ~90% recognition rate.

However! I created a separate ANKI deck with the individual hanzi that make up the ~600 HSK3 words. I think there are 617 individual characters. I was surprised and somewhat disappointed that I could only recognise about 30% of the individual cards (if that!).

Just now I was doing a review and the Chinese word, from HSK4 垃圾桶lā jī tǒng, I failed yet again to recognise the word, although if I heard lā jī tǒng I know instantly what it means. Hence my point being: if I could recognise the individual charterers and say the pronunciation in my head, I would know the word instantly.

Thus, I am pondering if there is merit in learning the individual characters, perhaps the top 500 from Da Jun’s list or Patrick Zeins list

I supposed by definition there must be a reasonable good correlation between character frequency and word frequency but never checked.

Any thoughts?

Posted
I supposed by definition there must be a reasonable good correlation between character frequency and word frequency but never checked.

That's actually an interesting question. I just noticed with certain characters, so many combination words pop up, and you don't have to "learn" them, because the combination just makes sense. So I guess yes, there should be merit in learning the most frequent individual characters.

I'm amazed thought that you created those 600-something cards. How long have you been sitting at your computer for doing that? Or is there a shortcut? I like Anki, but I find whenever you want to get creative with it, it is so time-consuming.

Posted
I'm amazed thought that you created those 600-something cards.

I've made thousands of Anki cards over the years. The minute or two that each one takes was part of the learning process.

I have a copy of the desktop-computer dictionary Wenlin. That shows me, for any character, the radical and other components, the words (most common first) that use that character; other characters that use that character; often an example sentence too.

Pasting relevant info from Wenlin into Anki is very simple.

Some people don't like the learning-characters-on-their-own approach and prefer words only. I found a combination of the two really helped me. But I agree that people should be dissuaded from assuming that if they can learn, say, 3000 characters then they will have made real progress in learning the language.

Posted
I've made thousands of Anki cards over the years. The minute or two that each one takes was part of the learning process.

Yeah, you are right of course. It's just that I have such a back log of Anki cards that I need to create, and in lack of something like your programme I do it via nciku or any other online dictionary, so each takes way more time than it should.

This Wenlin sounds neat, actually! I don't need a dictionary right now, but it is definitely something to consider.

Posted
I'm amazed thought that you created those 600-something cards. How long have you been sitting at your computer for doing that? Or is there a shortcut? I like Anki, but I find whenever you want to get creative with it, it is so time-consuming

Haha no.

I only ever create/update ANKI decks from excel spreadsheets.I never use ANKI to create a new card.

Basically I have 3 decks

HSK_Words: I just download the full HSK6 word list into excel and create extra columns that contain my own tags, example sentences (in Chinese, pinyin) from another spreadsheet. Initially I suspend cards with tags HSK2, HSK3 … HS6, so as I proceed up the levels, I simply unsuspended some cards. E.G. recently I am going through HSK4 so I have just unsuspended cards with that tag HSK4.

HSK_Chars: From the HSK4 list (which encompasses HSK 1-3) I have a simple excel VBA macro that separates the individual characters and removes any duplicates (i.e. creates a unique list). Also, for each of these single characters I have a list of 10 sample words (i.e. Hanzi/English/Pinyin) taken from my HSK deck and/or my English à Chinese Deck. My EàC deck is much bigger (about 1200 words) so I think its good to see the Hanzi and I will be eventually coming across them

To separate the HSK characters, a macro is not really needed, you can just create columns in excel and using the function “=mid(A1,1,1)”, where cell A1will is a HSK word, will give you the first character of that word =mid(A1,2,1) will give you the second character and so on. I don’t there is anything longer than 3 characters? Arrange all these characters in a single column, sort (ascending or descending) and use “=if(A1=B2,1,0)” to highlight any duplicates. Just simply remove the duplicates (not individually), and your done!

(Takes me longer to write the reply that actually do it haha)

To lookup a HSK word in my sample list of sentences (via vba) is trickier due to the fact that HANZI sentences have no spaces so difficult to do a proper search

I really recommend getting use to fields in ANKI, it’s a bit tricky at the start but well worth the investment

My 3rd Deck is every single sentence from Basic Chinese Grammer book, organised by chapter . Its very helpful and most interesting part of my study i think. The mist starts to clear. The sheer process of writing them out as I studied the chapters helped enormously, and makes grammer revision a lot easier

Posted

Thanks!

Takes me longer to write the reply that actually do it
I bet!!! Feeling a bit guilty right now :mrgreen: I know the "fields" function, and already tweaked my existing cards with that. But it just never occurred to me to first put all the data I need into excel. Brilliant. When I get to it, I will make a kotou* to you for every card!

(* surely doing so in thought counts too)

Posted

No problem, I get a lot more help out that I put into these forums :mrgreen: . I do find Linux Libre Office better for importing / exporting rather than excel.I

Posted
Ruben von Zwack wrote:

If you don't know the radicals and components, how do you discuss characters? Or remember them? For example, when we talk on the phone and my friend asks, "which Chen do you mean?", I say, "the one written ear and east", and she knows exactly what I mean.

淨土極樂 wrote:

Except that's not 'ear' but 阜.

But then again, if you try to explain the character like this, very few people will understand. See, learning the real radicals is indeed not very useful in the modern world.

Actually Ruben isn't far off the mark.

When Chinese people describe a character they would say 左耳旁 for 阝 on the left side rather than 阜.

And for the one on the right it would be 右耳旁 for 阝 on the right side, rather than 邑.

I kind of alluded to it above when I wrote

Kobo wrote:

Also, as you can see in the image above they've got the "standing man" radical, I think that's what they call it, instead of the regular 人 radical and sometimes it'll be the simplified version of some radicals.

I tried to find a list on the Internet for these "alternative names" for the radicals, but, this is the best I've got so far.

2dh8j68.jpg

They have 單(单)立人 for the "standing man" radical. I thought it was called something else, but, then I might have misremembered or I was thinking of how they say it in Cantonese. Or something.

I got 左耳旁 and 右耳旁 from a book I have, but, that list I got off the Internet doesn't even have them. Instead they've got 左耳刀 and 右耳刀.

The lists usually match up for the most part, but, as in everything involving Chinese (or any other language, for that matter) there are always discrepancies. :)

If someone's up to gather a more complete list...do it and post it here so Kobo won't have to. :)

Kobo.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...