hlk123 Posted February 28, 2006 at 07:33 PM Report Posted February 28, 2006 at 07:33 PM Hello Do you have some kind of users manual for the advanced interface of Adsotrans? Thank you. Quote
trevelyan Posted March 3, 2006 at 06:52 AM Report Posted March 3, 2006 at 06:52 AM Unless you're dealing with traditional characters, the only thing you will need to change is the "Style" button, which Adsotates by default. The other options are: Pinyin --> convert to pinyin Toneover --> display the tone mark above the character Pintone ---> display the tone mark and pinyin about the character Vocab List ---> output a CSV list of all vocabulary in the text Sounds --> clickable links to sound files with the pronunciation of each character Echo Chinese --> just output the Chinese (useful for converting between simplified and traditional forms) We can make this a thread to answer any specific questions people have though. Quote
hlk123 Posted March 3, 2006 at 09:16 AM Author Report Posted March 3, 2006 at 09:16 AM Thank you. I check/use also the Numeric Pinyin. What for are the Conjugate and Grammar button? How about a "literal translation" option? (about literal translation see Charles Li's Mandarin Chinese and Yip's Essential Grammar). I think the probability of getting a right literal translation is much bigger than a 100% English translation. Example: 不是我的文法不好,是文法太难了 Adsotrans: translate is my grammar is not good , are grammar too had difficulty Literal translation: Not BE I of grammar not good, BE grammar too difficult Quote
trevelyan Posted March 5, 2006 at 03:08 PM Report Posted March 5, 2006 at 03:08 PM Conjugate --> turns on and off verb conjugation. Grammar --> runs the text through a grammar parser. Generally improves performance but increases processing time. If you turn off conjugation and grammar you will be closer to a literal translation. Quote
snarfer Posted March 29, 2006 at 10:37 AM Report Posted March 29, 2006 at 10:37 AM How much text can adsotrans accept in the input field? There seems to be some sort of limit. I'm trying to generate a vocabulary list for a document several pages long. What do you think the most efficient way to do that would be? Quote
trevelyan Posted March 29, 2006 at 04:10 PM Report Posted March 29, 2006 at 04:10 PM (1) If you'd like to forward me the document I can process it for you. Alternately, (2) put the text online somewhere and feed it into the engine as a remote webpage. Be sure to provide the full URL ("http://...") and the engine will recognize and process it as a remote webpage. There is still a limit on text length, but it is quite high. Example using Baidu: http://www.adsotrans.com/new/traditional.pl?study=on&url=http%3A%2F%2Fwww.baidu.com&service=adsotate&conjugation=on&grammar=on&encoding=GB2312&encoding_out=GB2312&quality=high Quote
snarfer Posted April 3, 2006 at 09:42 AM Report Posted April 3, 2006 at 09:42 AM I've been trying to use this on the Chinese Radio site: http://gb.chinabroadcast.cn/chinese_radio/index.htm http://gb.chinabroadcast.cn/1321/2006/04/03/542@975260.htm Unfortunately, results so far have been limited to either no response or gibberish. I would like to generate vocabulary lists from Chinese Radio broadcasts. I haven't yet tried cutting and pasting all the relevant content into a text file and posting it. Am I doing something wrong? Quote
trevelyan Posted April 4, 2006 at 07:03 AM Report Posted April 4, 2006 at 07:03 AM Am I doing something wrong? Did you remember to set the encoding to GB2312? Quote
hughitt1 Posted April 9, 2006 at 01:22 PM Report Posted April 9, 2006 at 01:22 PM Did you remember to set the encoding to GB2312? Just out of curiosity... For this site i noticed that they actually hard-coded the source code that they used (which is great), but alot of sites don't do that. Do you have any idea how the encoding can be detected then, other than by trial-and-error? I've noticed that when i did some html with chinese, that the encoding you actually type the source-code in (or cut-and-pasted from) is what seems to stay for that piece of text. I have trouble sometimes after the point trying to go back and see what encoding some text was done in. Any thoughts? Keith Quote
trevelyan Posted April 9, 2006 at 11:37 PM Report Posted April 9, 2006 at 11:37 PM Hey Keith, I *think* you can check the encoding for a page in most browsers by clicking on "View-->Encoding", although I'm using a Chinese version of IE now so am not sure if that works on English operating systems if you're using one. Most mainland webpages use GB2312 and most international webpages use Unicode though so just knowing where a website is hosted is usually enough. In a jam try checking to see if a website has an ICP license from the MII at the bottom of the page. If it does the content will almost definitely be in GB2312. The easiest thing is probably just to ask the software to guess the encoding by selecting the "Guess" option on the advanced page. If the software gets it wrong send me a note with the link and I'll try to improve the encoding recognition algorithm. What we have should be pretty good at differentiating between GB2312 and Unicode though. The tough thing is occasionally figuring out whether texts are simplified or complex. Quote
msittig Posted May 10, 2006 at 02:58 AM Report Posted May 10, 2006 at 02:58 AM Not that it really makes a big difference, but is there a reason that the input default is Unicode, and the output default is GB2312? Quote
trevelyan Posted May 10, 2006 at 06:56 AM Report Posted May 10, 2006 at 06:56 AM Input is UTF-8 because that supports both complex and simplified and the text is handed to the server using the encoding of the page in some browsers. There's no real reason to have the output default as GB2312 if it is causing problems. Is there any reason to switch? There were a couple of punctuation marks that didn't translate well, but I tried to take care of those.... Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.