Chinese Input methods

July 8, 2008 at 10:10 AM

Wups, yeah, typo. Both give me nothing though.

Doesn't seem to be a way to change from traditional to simplified with Wubi. The only way to do that would be to edit the table, and that would be a lot of work. A lot. Could be done though. Found the table file simply by downloading the table source code, and it's a pretty simple format.

BEGIN_CHAR_PROMPTS_DEFINITION
a 工

b 子

c 又

.

.

.

BEGIN_TABLE

a 工 52175

b 了 774967

c 以 219424

d 在 594798

e 有 512230

.

.

.

I'm assuming the third column is frequency.

Ok, I've checked and changing the table file is, indeed, all that's need. I'm sure someone out there is fully capable of doing that, right? Right?

Also, rebooting is a pain. I keep my Windows install purely for games. Is your other install OS X?

July 8, 2008 at 12:45 PM

I've lost interest in this thread. But regarding the stroke order of 學, in case you guys are not aware, it starts with the first cross in the middle. Not sure if the stroke order is relevant to the wubi-whatever input method, though.

Take a look at the 4th column from the right ->

July 8, 2008 at 01:41 PM

Haha, thanks Skylee, and no I was not aware, and I guess it seems neither were the people coming up with the mapping tables for traditional characters I guess that's just a further example of Wubi's simplified bias.

@ipsi, at home I have a Mac running OSX, it's an Intel Mac and I can also boot it up into Windows if necessary (only rarely do this, usually just to pay bills online). At work I have a Windows box that I've also installed Linux on. I'd like to use Linux more, but just haven't had the time to get it setup with all the necessary software so I can do my job, plus there's still that font issue hanging around . My boss said he's happy to let me run Linux as long as I install and configure it in my own time, which means that until I've got it all setup, I run mostly under Windows.

Anyway, checking at home now on my Mac with FIT, WFQB also produces 學.

Doesn't seem to be a way to change from traditional to simplified with Wubi.

But you don't need to change, as it can type both at the same time, e.g. IPBF will get you 学 and WFQB will get you 學. I guess there's a precedence issue for characters with the same keycodes, but I'm sure that would be an option somewhere (at least it is with other Wubi input methods I've seen, which let you choose Traditional or Simplified).

July 8, 2008 at 07:28 PM

Reactions to the last few posts:

1. The stroke order which skylee showed us does not correspond to the way I learned to write 學. I always begin with the left part of 臼. That's how Integrated Chinese Level 1 Part 1 teaches to write the character.

Also, both VQQB and WFQB imply that the stroke order begins with the left side of 臼.

Have we hit upon some matter of disagreement among calligraphists?

2. I confirm that WFQB works in SCIM.

3. After my last post, I've played with InputKing:

http://www.inputking.com/EN/

Their Wubi implementation accepts both VQQB and WFQB. They also have a tool which gives the Wubi code of characters and this tool works with traditional characters! If I check 學 with their tool, it lists both VQQB and WFQB as possible codes.

At this point, I have more faith in InputKing's implementation of Wubi than in SCIM's. (SCIM has been a very mixed bag quality-wise I must say.)

4. I probably could fix SCIM's Wubi implementation but I don't have time right now.

July 9, 2008 at 02:59 AM

Imron: Ubuntu 8.04 had fonts that worked just fine for me out of the box. You may want to try a *clean* install of that if you haven't already. It's worth noting that was Gnome though, not KDE.

I use Linux at work (everything's open-source here, basically), and home, and the Comp Sci department at Uni uses NetBSD.

Precedence is what I was worried about, mostly. Can't type at work, but 语 and 語 have the same combination, and SCIM seems to pick precedence based on frequency, rather than simplified/traditional. Of course, simplified characters are likely to be far more common in whichever dataset they used to generate the frequency...

Lemur: SCIM has its oddities... It's not as good as even Microsoft's , in my opinion. But it works, and that's the most important thing for me.

As I said above, SCIM's Wubi thing could be fixed to prioritize traditional, or even show only traditional, but it would be a lot of work. As it stands, I don't know if it displays something like 學醫 as well as 学医. It might not have a lot of traditional words in the list, which would cause problems. Someone here can have a look, and if not I'll check when I get home.

July 9, 2008 at 03:35 AM

But regarding the stroke order of 學, in case you guys are not aware, it starts with the first cross in the middle.

According to wiktionary ( http://en.wiktionary.org/wiki/%E5%AD%B8 ) that is the traditional traditional stroke order. The modern stroke order starts with 臼. Conventional stroke orders can change, just like character variants.

July 9, 2008 at 04:24 AM

You may want to try a *clean* install of that if you haven't already.

Yeah I've been meaning to, but like I said, I'd have to do it on my own time, and I don't fancy coming in on the weekends

July 9, 2008 at 08:13 AM

Hah, fair enough. Computers are cheap in China (well, compared to New Zealand...), so buy one and try at home! .

Anyway, thinking still of SCIM, and because I didn't think to check the tables, you can write 学习 with SCIM, but not 学医、學醫或學習。 That could make it kinda hard to write traditional with it.

July 9, 2008 at 08:33 AM

I can't imagine it would be too difficult to write a script that parsed the mapping tables, and combined it with the info in unihan (which has traditional/simplified markers for various characters), to generate a mapping table with codes for all the various traditional multi-character phrases. Likewise, it probably wouldn't be too difficult to add a column to the mapping table specifying if the character was traditional/simplified, and the modifying the SCIM source to put that first depending on a user-specified setting. Anyway, that's probably an interesting project if you've got the time

July 9, 2008 at 12:28 PM

ipsi() said:

Anyway, thinking still of SCIM, and because I didn't think to check the tables, you can write 学习 with SCIM, but not 学医、學醫或學習。

I took that as a challenge. I was able to type 学医、學醫或學習 using SCIM's Wubi mode. I am not claiming though that the sequence of keys I used is the optimal one or that it is even correct Wubi, but it is doable.

July 9, 2008 at 02:09 PM

I think ipsi() was refering to the multi-character combinations of wubi. For example, the shortcut rule for two-character combinations is to take the first two keys from each character. e.g. for 学习, the full codes for each character are: 学 IPBF, and 习 NUD, and so the code for 学习 is therefore IPNU. Likewise 医学 is ATIP (AT from 医 ATDI, and IP from 学 IPBF).

However the same shortcuts don't exist for traditional characters, e.g. following that logic, you should be able to use WFNR to type 學習, or ATWF to type 醫學, but you can't. These multi-key shortcuts are a part of what allows people to type so fast with Wubi, and so while it is definitely possible to type each character one character at a time, without the multi-character shortcuts, it would slow typing speed down somewhat.

This is another good example of Wubi's simplified bias which hadn't really occurred to me before.

I imagine there would be too many keycode collisions if all the same shortcut character combinations existed for both simplified and traditional.

July 9, 2008 at 03:06 PM

I think ipsi() was refering to the multi-character combinations of wubi.

Probably.

For the heck of it, I checked in InputKing and VQNR does produce 學習. WFNR produces nothing in InputKing.

Maybe if SCIM supported VQQB, then it would support VQNR.

(BTW, I have to say the VQQB decomposition for 學 seems more natural to me than WFQB.)

I've also tried XGNR for 練習: works in InputKing, produces nothing in SCIM.

July 9, 2008 at 03:24 PM

BTW, I have to say the VQQB decomposition for 學 seems more natural to me than WFQB.

I agree, for the same reason that you use D for 古 e.g. 胡 is DE 古 + 月 rather than FKE 十 + 口 + 月. There are plenty of other similar cases too, and the preference should be to use the more larger root where possible.

July 9, 2008 at 11:12 PM

Just FYI,

That online handbook lists stroke orders of Imperial China and the ROC. Wubi assumes PRC stroke order even for Traditional characters.

September 18, 2008 at 03:45 AM

Reviving the debate about Cangjie vs Wubi a little bit... I'm finding it hard to get good support for typing traditional characters in Wubi. Earlier in this thread I mentioned a bit the problems with scim's less than perfect implementation of Wubi. Then a few days ago I found that CJKOS has a Wubi implementation which works only in simplified character mode. It seems to me that at every step of the way I have to bang my head against software developers who have decided that there is no point supporting traditional characters with Wubi or no point QA'ing the implementation of traditional characters in Wubi (in the case of scim). I'm starting to think maybe I just should just bite the proverbial bullet and learn Cangjie.

(For those who don't know: CJKOS is a software for Palm-based PDAs and phones which allows Palm-based devices to support the display of Hanzi, Hanzi input, localization, etc.)

September 18, 2008 at 05:58 AM

Typing traditional characters in WUBI is not that complicated, input methods like 极点五笔万能五笔 ect have a function called 简入繁出, when switched on, you can type charaters in the simplified code but all characters will be converted to the traditional form, . for example, if you type fggh for the simplified character 干, you will get all its traditional form of 乾幹干

September 18, 2008 at 07:27 AM

Typing traditional characters in WUBI is not that complicated,

The issue is not whether typing traditional characters in Wubi is complicated or not. The issue is whether reliable support for traditional characters is actually available to me. In my opinion, scim's support for traditional characters in Wubi is not reliable and then CJKOS has no support for traditional characters in Wubi. I will also add that when I recently shopped for an electronic dictionary in Taiwan, asking whether the dictionaries had a Wubi input method just elicited blank stares. (And I was accompanied by an educated Taiwanese who helped me so I don't think the issue was me being unable to communicate what I wanted.)

So more and more my impression is that in theory Wubi can handle traditional characters without problem but in practice support for traditional characters in Wubi is spotty. I don't want to learn Wubi and then find down the road that I have keep fighting to get proper implementation or that the devices I buy don't support it. I'm not interested in fixing scim and I can't change CJKOS to support Wubi with traditional characters.

September 18, 2008 at 08:01 AM

At least I saw some replies are talking about typing traditional character according to its traditional form, totally unnecessary. AND, wubi is designed for simplified character, mainly used in mainland china, in taiwan/hk/macao they use canjie, of course you got despised asking that stupid question.

Edited September 18, 2008 at 07:23 PM by imron
language.

September 18, 2008 at 08:03 AM

input methods like 极点五笔万能五笔 ect have a function called 简入繁出,

The problem with that, is that a person who mostly only knows traditional will not know, and probably not be able to figure out, the simplified code.

September 18, 2008 at 08:44 AM

Oh man, this thread is really boring. wubi is designed based on simplified character, and cangjie on traditional character, no one will use wubi when he only knows traditional character, just like no one from the mainland would use zhuyin, because they dont even know zhuyin.

Sign In

Chinese Input methods

Recommended Posts

ipsi()

skylee

imron

lemur

ipsi()

foodtarget

imron

ipsi()

imron

lemur

imron

lemur

imron

Hofmann

lemur

vampire

lemur

vampire

imron

vampire

Join the conversation