hlk123 Posted November 2, 2006 at 02:45 PM Report Share Posted November 2, 2006 at 02:45 PM Hello I cannot read Chinese characters. I've made two experiments. With the help of Wenlin-3 I tried to "understand" the Chinese characters of "Dong Cunrui Was My Son". At the end I DIDN'T laugh. I was confused. Then with Adsotrans tooltip. At the end .. I laughed. Why? The Adsotrans tooltip has understood the "context". It translates with regard of the surrounding text. On the other hand Wenlin-3 sees "blindly" only the character (with its neighbour) and tells me all the possible meanings of the character! Will I find "such help" in all Chinese texts in the Adsotrans website? Even in the NewsinChinese? Thank you. Quote Link to comment Share on other sites More sharing options...
trevelyan Posted November 2, 2006 at 03:51 PM Report Share Posted November 2, 2006 at 03:51 PM Tom Bishop has done a wonderful job with Wenlin... and this isn't actually a fair comparison at all. The texts on http://textbook.adsotrans.com have been manually reviewed and corrected so the annotations *should* be better than any machine will provide. Although Adso attempts to contextually disambiguate between POS and multiple definitions when analysing texts and suggesting definitions automatically, the accuracy of the software is limited by the size and sophistication of the backend database, as well as the algorithms in the software. We are working to improve both the database and the algorithms over time. But there isn't a commercial business model supporting the project so we simply can't afford to do something like license the ABC dictionary (or any commercial linguistics databases for that matter) If you find the software useful you can help by adsotating texts and adding words the database is missing. That and letting people know we exist to help build the community around the software. Wenlin is a dictionary reference tool and not a contextual text analysis engine. The ABC dictionary it licenses is only available for commercial use and it is quite expensive. It is more flexible than the Adso dictionary from a student perspective, especially if you want to be reading older texts and are looking for a variety of potential definitions for various words. Adso is better with contemporary news documents -- although most of this is simply a matter of various people having taken the time to add words in contemporary texts as they are reading. We need to do a lot of editing to bring the expert texts up to a readable quality when we are editing them for the textbook site. Quote Link to comment Share on other sites More sharing options...
hlk123 Posted November 4, 2006 at 08:19 AM Author Report Share Posted November 4, 2006 at 08:19 AM Hello Thanks for the general explanation and for your modesty. I have bought Wenlin software for USD 200 or 300. There are more difficulties for Wenlin. They are personal names and word-boundary. Although I do not understand f.e. Italian .. I can see in an Italian text the personal-names and the word-boundaries. Chinese text consists (mostly) of a string of characters. What advantage have I from the J.DeFrancis's dictionary behind Wenlin, if Wenlin marks out the wrong word-boundary in the first place? I have to put a "-" before the next character! Is the algorithm to "see" a personal-name or/and a word-boundary in Chinese too complicated for the computer nowadays??? Thnx. Quote Link to comment Share on other sites More sharing options...
atitarev Posted November 4, 2006 at 10:25 AM Report Share Posted November 4, 2006 at 10:25 AM I think the Chinese word boundaries are arbitrary when romanising a Chinese character text - they only make it easier for beginners - since Chinese texts don't have word spaces. For Chinese zaijian or zai jian doesn't make any difference. Some romanisation schemes use spaces after each syllable (compare with modern Vietnamese). Knowledge of the words of the language tells you that "zhong guo" is "China" - one word in English but its components tell you that it is Middle Kingdom - 2 meaningful components. Wenlin romanisation engine is great but the word boundaries are determined by a foreign language preference, IMHO. You will only have problem finding boundaries where you don't understand the grammar or the words. In such cases Wenlin offers multiple options. Just in case you don't know what I am talking about: Edit -> Make transformed copy -> Pinyin transcription (will offer you to segment words first) When you use the mouse-over dictionary in Wenlin, it also shows you multiple options (- expand the status bar to see more lines) - single character words or multicharacter words if such appear. The multicharacter words are all made of separate single character words, that's why there's no need to segment them. Quote Link to comment Share on other sites More sharing options...
hlk123 Posted November 5, 2006 at 12:08 PM Author Report Share Posted November 5, 2006 at 12:08 PM Hello Thank you for your explanation. Take this part of a sentence from "Dong Cunrui Was My Son" as an example: ... 全会场的人都要疯了. The first 3 characters caused Wenlin (or ME!) some "problems": 1. "ambiguities need fixing" in "segment Hanzi" 2. "After making any needed corrections to the segmentation, you may repeat the command to make a pinyin copy" in "Pinyin transcription". 3. with "Instant Lookup" (2.2 page 70) one (I !!) has to decide which information is right! I hope the future Chinese electronic dictionary will solve such a small problem ... with additional algorithm. That's why to read/understand online a Chinese newspaper headline is so difficult ... (i.e. impossible!) Quote Link to comment Share on other sites More sharing options...
gato Posted November 5, 2006 at 01:55 PM Report Share Posted November 5, 2006 at 01:55 PM I cannot read Chinese characters. It sounds like you are getting ahead of yourself. You should get a beginner's book and go on from there. Quote Link to comment Share on other sites More sharing options...
hlk123 Posted November 6, 2006 at 07:58 AM Author Report Share Posted November 6, 2006 at 07:58 AM ... I thought .. this is the DISCUSSION place about Adsotrans. And I am interested on these things: # Built on Artificial Intelligence Adso is a machine translation system, not a simple dictionary-lookup tool. The software attempts to pick the most sensible gloss for any entry based on its grammatical and ontological content. It understands the difference between the words "Al Qaida" and "base", for instance. It differentiates between definitions based on their part of speech as well as ontological content. The software also catches subtle nuances in language usage such as occur with duoyinci. # User Extensible Adso is a translation system which improves as it is used. Algorithms in the software expand our knowledge of how the Chinese language works even as it processes texts. So simply using the software is helping our efforts in part. Missing content can also be added to the backend database on-the-fly: the software will recognize the word henceforth. Special rules can also be defined to help the software navigate the ambiguities of Chinese text. These changes take effect immediately, although they are reviewed promptly to prevent errors from persisting. # Easily-Extensible Rules System Special rules can be created for helping the software determine how to parse and translate Chinese text down to the individual word level. Changes can be made without recompiling the program or messing with the source code. # Automatic Name/Date/Loanword Recognition Adso attempts to automatically identify many dates and times, and personal and geographic place names. The software also searches for foreign loanwords and provides an appropriate explanation rather than an inaccurate and broken gloss for these words. Quote Link to comment Share on other sites More sharing options...
trevelyan Posted November 6, 2006 at 12:08 PM Report Share Posted November 6, 2006 at 12:08 PM I'm actually delighted to talk about algorithms and approaches to text analysis and segmentation. I guess what made me a bit hesitant to weigh in here is that I really like Wenlin. Adso has different design goals so it has advantages in a few places. But it also has disadvantages. As far as I understand it, Tom is trying to build Wenlin as an interface to a commercial dictionary. He is also putting tremendous effort into developing an excellent character dictionary of his own. But the software errs on the side of caution. When there is an ambiguous character in simplified->complex conversion, for instance, Wenlin flags it for manual conversion. Adso will guess by default, which means that in some situations it can provide a lot more guidance, but also risks being misleading. I'm not sure if Tom Bishop could integrate some of the more advanced text-analysis features into Wenlin that are present in the Adso software, since that would require being able to tap into rich POS-data about words and characters when doing segmentation and the ABC dictionary may not be well suited for providing that data anyway. But I get the impression Tom is trying to avoid introducing errors and has a bias towards reading Chinese words as combinations of characters. Which means that the closest thing to Wenlin is Plecodict: the focus of the software is on displaying licensed content in various forms rather than making informed guesses about texts based on that content. Adso is different for a number of reasons. One of the big ones is that we are being forced to deal with much messier data because we are having to generate our own content. This probably makes us less afraid of making mistakes as long as we can correct them over time. I'm also more prone to the opinion that Chinese text isn't reducible to characters and so a conservative system is going to make "errors" by not being adventurous enough in segmentation, etc. Another major difference is that we want to provide a single targetted definition for each word, rather than a list of all possibilities. So Adso is closer to a translation system than a dictionary tool. The algorithms are much more sensitive to things like POS tagging. But it isn't exactly a translation tool either. To put this in perspective with reference to other programs, one of the reasons that many translation packages like Systran manage to get more fluent translations is that they drop information from the sentences they are translating. Another example is that because they are catering to non-speakers, they are forced to break up words they don't recognize. Even if Google is confident that 肖斯塔科维寄 (Shostakovitch) is a personal name, if will not provide the Chinese characters in its translation. And so you it ends up with a broken translation sometimes because broken English is still preferable to Chinese for complete non-speakers. Adso can throw up its hands and say "I have no clue what this means, but I'm pretty sure it's a name". But that's because almost everyone who uses it already has a grasp on the language and understands. Quote Link to comment Share on other sites More sharing options...
hlk123 Posted November 7, 2006 at 09:58 AM Author Report Share Posted November 7, 2006 at 09:58 AM About Wenlin/T.Bishop: I bought Wenlin directly from the version 1.0. At that time it was without DeFrancis' dictionary. I bought each mayor update and I will buy BLINDLY the next mayor updates! You wrote on the 26th September 2006, 10:28 AM: "The software analyses the grammatical context when deciding which definition / entry to suggest. It TRIES to pick the single BEST definition for each word RATHER THAN SIMPLY THROWING EVERYTHING at the user." I like this sentence without understanding the details. Now .. you didn't talk about Wenlin .. do you? Well .. I will buy your product (the desktop form) from the version 1.0. Prerequisite: Adsotrans translates "correctly" (see roddy:"Comparison of Free Online Chinese to English translation engines"). In the future Adsotrans would be a front-end of Wenlin, which is a front-end of the DeFrancis' dictionary. Yeah! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.