Jump to content
Chinese-Forums
  • Sign Up

Basic Python module for adso


Recommended Posts

Posted

Stemming from the discussion in this thread, here is a basic python module that will perform web-based queries against the adsotrans website and return the results as a list of tuples.

There are 3 files:

adso.py - the main module

adsotatepage.py - class that handles processing of the adsotrans webpage

test.py - simple test harness

if anyone was interested, it probably wouldn't be too hard to have a translatepage or a pinyinpage that would process and return the results from a translate or pinyin query.

To use the module, import it, and create an object of the Adso class.

I decided to write an Adso class rather than just having functions in the module, so that all the different adso options (conjugation, grammar, encoding, encoding_out, numeric_pinyin and quality) can easily be preserved across multiple calls. These values are set in the constructor, and are simply strings that correspond to the values passed to the adso url.

Default values are:

conjugation='on'

grammar='on'

encoding='UTF-8S'

encoding_out='UTF-8S'

numeric_pinyin='off'

quality='high'

To use, simply import the module, create an Adso object, and call the adsotate member function with the text that you want.

from adso import Adso

adso = Adso()

result = adso.adsotate( '你好世界‘ )

result will be a list of tuples containing the values (chinese, pinyin, translation), with one tuple per segment of text, ordered by the same order the segments appear in the original text. e.g. the above example produces the result:

[ ( '你好', 'nǐhǎo', 'hello' ), ( '世界', 'shìjiè', 'world' ) ]

Note: the encoding of the text you pass in should be what you provided as the encoding when creating the Adso object (defaults to utf-8 ).

Anyway, it's all pretty basic at the moment, and doesn't really do anything more advanced than generate a query to the main adsotrans webpage, and then parse the resulting html file. There's also very little in the way of error checking, so you'll get exceptions if you can't connect to the internet etc. It was done more as a proof-of-concept than anything else. Is this the sort of thing you had in mind Kudra?

BTW speaking of errors, I don't know if this is of interest to you Trevelyan, but the python HTMLParser says the output generated by Adso has malformed start tags at various places in the html. The w3.org validator reports errors in the same lines/columns, but it seems to be because it's treating the adso.zip

Posted

Haven' t played with it yet, but from all appearances, in the words of Will Smith in Men in Black I, "Now that's what I'm talking about!"

thanks.

Posted

Hi Imron,

Awesome work. Would you mind if I ported something like this over to Java? I'd love to be able use it in the ZDT and I'm sure others would use it as well.

Chris

Posted

Looks good. Let me know if any changes are necessary on this end to help out. It would be possible to create a script that just spat out the information delimited in a more convenient way for parsing/processing if that would help or be faster.

Posted

@trevelyan -- that would be convenient. In my experience of parsing yahoo pages, it is always a pain when they change the html format. By essentially providing an api you or we python(or other lang) programmers wont have to worry if you change stuff around in the html.

Posted

@bogleg - go for it, it's not even 100 lines of code, so I can't imagine it'd take too long. Though you might want to wait until trevelyan can produce a page with a more streamlined output.

@trevelyan - yeah, a more suitable format would be nice, and would certainly be more future-proof. Maybe just a simple XML file along the lines of:

你好

nǐhǎo

hello

(or less verbosely

:) )

You could of course add any extra other info that was relevant/useful (part of speech, simplified/traditional conversion etc). All of which (including the 3 listed above) could be toggled by parameters.

This format would also lend itself nicely to the other styles of queries (translation/pinyin), which would simply just have one segment containing the entire body of text with the appropriate pinyin/translation.

  • 3 months later...
Posted

Ok. First file here takes in GB2312. The second takes in UTF8. Because of the need to support both simplified and traditional, both files return content in UTF8.

http://www.adsotate.com/adso/api-gb2312.pl?text=TEXT

http://www.adsotate.com/adso/api-utf8.pl?text=TEXT

There's no guarantee these files will stay online here. So if you set up anything using them send me an email so I can notify you if they move.

  • 9 months later...
Posted

Is there an API currently? The links above are non-functional. Been looking at restarting a couple of old dead projects and easy Adsoing would be handy.

  • 3 weeks later...
Posted

I'm heading down to Australia at the end of this week and will be back in China January 10th. Will be getting a new server then and will look into setting up a revised API then. If you need anything in particular before that just email me Roddy.

Posted

No rush on my behalf. A reliable Adsotrans API would be a pretty cool thing to have though, and I'm sure it would get used.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...