Jump to content
Chinese-Forums
  • Sign Up

Seperating Characters into groups of words software?


Recommended Posts

Posted

I'm looking for a source of material that has chinese articles, but more then that, the words need to be seperated so I can translate them (A giant flaw of Chinese in my opinion).

For example:

我有大苹果 is how it's normally written.

我 有 大 苹果 is how I need the source material written for easy translation.

I haven't had any luck finding this, so I'm now looking towards this:

Machine translators can somehow manage to find the character combo's that make a word in Chinese so they can translate an article.

Is there anykind of program that can seperate the words by a space making it easy to see the words?

Posted

Ok, the closest I've found so far is DimSum from Mandarian tools. It highlights the words and provides english and makeshift pinyin translations.

If anyone know's of a better tool that's not going to cost an arm and leg let me know.

Posted

Try NJStar's study list function (under the Tools menu). It's close to what you want. The demo version is fully functional, though it has a reduced dictionary.

http://www.njstar.com/njstar/chinese/

New Language Study Functions

Version 5.01 has introduced a "study list" for vocabulary study. A "Word Annotation" function is also introduced in this new release. It searches NJStar dictionary and annotates Pinyin spellings and English meanings at the end of each paragraph.

Posted

Yeah, that is a good tool, it does almost what I want it to.

The annotation is a bit confusing, I noticed that when browsing over this sentence:

开普敦是南非人口排名第三的城市, when I got to 南非 it said South Africa, but the very next character, it said not the right person. So, two words are sharing one character, which can make it confusing to figure out which one is the right one (I'm just going to assume the first one is always correct).

Thanks for the tip.

And, using this, and the chinese wikipedia, I *should* be able to do what I want to do.

Posted

Yes it is a problem for Native English, but I believe it is okay. I can read the sentence without any problems. Practices makes perfect.:wink:

Posted

I'd check out either the default adsotate options (use Firefox so you get the nice highlight feature in addition to the popups) or use the vocabulary list option: http://www.adsotrans.com/new.html. If there isn't a suitable output option all we can always add one.

If you're studying the language for translation you should probably get used to reading un-segmented Chinese.... :(

Posted

Yeah, I don't think it would be wise to learn to rely on word-seperated Chinese - it's something you just won't have in real life. The standard Adsotrans annotation though, will let you see the word boundaries when you mouseover - so it's there when you need it, but you can avoid relying on it.

Posted
Yeah, I don't think it would be wise to learn to rely on word-seperated Chinese - it's something you just won't have in real life.
That's an incredibly true point, but I'm still far from good, so I really need tools like http://www.adsotrans.com/, DimSum, NJ Star, and Google to figure out what the words are and what they mean.

I'm actually planning on taking articles from Wikipedia, finding/translating the words, learning them, then reading the article over and over. I figure that's real life Chinese, so I'll learn that.

P.S. trevelyan:

So far, the word 中亚细亚 hasn't turned up in any of the tools I've been using to check words. My girlfriend said it's a region about where Asian Minor is, I got the same idea myself. Might be a new addition for you.

Posted

中 亚细亚 = Central Asia

亚细亚 (ya si ya) is the transliteration for Asia and sometimes is used instead of 亚洲 to refer to Asia.

Posted

I think you'll be better off learning grammar patterns from a text book as well. Once you start coming across Chinese names where the parents have thoughtfully given their child a name that's also a phrase, that's where things get really interesting. Grammar patterns may be boring, but they do help.

Posted

Thanks Doug. "亚细亚" was already in the database when I checked (did someone just add it?), but I've added 中亚细亚 as "Central Asia". If you ever run into this sort of thing and feel confident about the translation, you can always add words using the "Quick Add" feature. Sexy interactive stuff... ;)

Posted
中 亚细亚 = Central Asia

亚细亚 (ya si ya) is the transliteration for Asia and sometimes is used instead of 亚洲 to refer to Asia.

So I suppose I should update me flash card from Asia Minor to Central Asia. ;)
Thanks Doug. "亚细亚" was already in the database when I checked (did someone just add it?), but I've added 中亚细亚 as "Central Asia". If you ever run into this sort of thing and feel confident about the translation, you can always add words using the "Quick Add" feature. Sexy interactive stuff... ;)
I didn't add it, but I'll help out with words that aren't listed.
I think you'll be better off learning grammar patterns from a text book as well.
True, but I'm going to try an approach that was inspired from the linguist. Just learn to read and speak it, and make sence of the grammar as I go/get corrected.
Posted
Is there anykind of program that can seperate the words by a space making it easy to see the words?

The full version of Wenlin will do this.

Basically, what you need to do is make a Pinyin version but stop before it gets all the way to Pinyin.

When you have a page of Hanzi text in Wenlin, go to

EDIT --> MAKE TRANSFORMED COPY --> PINYIN TRANSCRIPTION

You will then be asked whether you want to "segment first". You should click on "yes". Wenlin puts pipes ( | ) rather than spaces between words; but those are easy to do a search-and-replace on.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...