Information Technology Vocabulary and Developing Chinese Language Web Sites with PHP

October 1, 2007 at 12:03 AM

I just posted several pages on the topic of vocabulary for information technology at chinesenotes.com among which are

On each page you can click on any word to see a summary of it's meaning and use, including MP3 recordings. To support this I constructed a vocabulary database, which also allows users to search conveniently. I also wrote an article for developing Chinese language web sites with PHP:

Processing Chinese Text with PHP

I hope that someone gets something from these. If anyone has any suggestions please reply to this post.

October 1, 2007 at 02:57 AM

Looks interesting. I'm interested in finding some sort of resource which contains as many computer terms as possible, that I could possibly turn into a dictionary for Pleco, and use as a way to start learning how to do all the stuff I can in English on a computer, but in Chinese.

Found a couple of other websites, but I'm not sure how up-to-date they are, and they're at home at the moment. I'll post them when I get home.

October 1, 2007 at 11:28 AM

What are the problems handling Chinese with PHP? GB2312 works fine. UTF-8 is backwards compatible to latin1, and so doesn't break PHP. The only real issues are string manipulation functionality breaking with UTF-8 text. But you can get around that by converting the text to hex before manipulating, and converting back afterwards.

@ipsi - we have a host of computer terminology in the Adso project. You're welcome to create a Pleco-compatible file and distribute it if you'd like. The version Mike is distributing is about 60,000 entries out of date.

October 1, 2007 at 08:35 PM

There are a number of problems with multi-byte character processing in general in PHP. I don't think that these are specific to UTF-8. Some examples are:

ord() does not work with multi-byte characters
ctype_* functions do not work with multi-byte characters
The MySQL database driver does not work well with multi-byte characters

There are work arounds for all these issues but the point is you have to know them and that is what the article is about.

As I understand it, GB2312 is a character set not an encoding so it should be compared with Unicode rather than UTF-8. I would be concerned about using a character set that does not include all major languages. It seems a great limitation, especially in the age of Web 2.0 and mash-ups, to have to restrict users to Chinese and Latin characters. Whatabout Chinese speaking people in Russia, Thailand, the Middle East, etc? There are no shortage of Chinese entrepreneurs in these places and I am sure would need to combine these languages with Chinese and English, given that English is the most common business language in the world. Local people in these places may want to learn Chinese. There are plenty of use cases where Chinese and a non-Latin script will need to be used together.

October 2, 2007 at 07:49 PM

I would, but I'd like something that's fairly specific, in that it only has computer terms, and very little else. It'd also mean that I wouldn't have to scroll past stuff I wasn't interested in if I just wanted to look at random computer-related words and their meanings.

Anyway, the two websites are:

http://ihome.ust.hk/~lbsun/terms.html

http://www.iscs.nus.edu.sg/~colips/archives/glossary/GB/d.html

I think the top one is Traditional, the other Simplified.

Sign In

Information Technology Vocabulary and Developing Chinese Language Web Sites with PHP

Recommended Posts

alexamies

ipsi()

trevelyan

alexamies

ipsi()

Join the conversation