Ruby Pinyin Word Grouping

July 10, 2022 at 05:29 PM

Currently at the 'tiny little annoying orthographic bug' phase of Pleco Ruby pinyin support and puzzling over a question I can't seem to find a good authoritative answer for online.

If you're applying ruby to Chinese words (rather than individual characters), and the Pinyin reading for a particular word contains a space/hyphen, should the pinyin be broken up on those boundaries or should it be left intact + treated as a single ruby reading?

To use an example from pinyin.info, with 十七八岁 shíqī-bā suì, it seems like there are four ways one might apply ruby to those characters:

1) Character by character

shí qī bā suì

十七八岁

2) Split on hyphens + spaces

shíqī bā suì

十七八岁

3) Split on spaces, but not hyphens

shíqī-bā suì

十七八岁

4) Don't split at all, group by the entire entry

shíqī-bā suì

十七八岁

We're already offering an option for 1 (since that's obviously the only way to do this with vertical Zhuyin ruby), the question is which one of 2/3/4 we should support in addition to that. The system can already support any of these options (we just tell it which characters to treat as breaks in the pinyin and it does), so none of them are more work for us to implement than any other, it's just about which one we make the easy-to-select default.

July 11, 2022 at 01:14 AM

If someone like you is struggling with which way to go, then it probably doesn’t matter too much. My opinion should literally count for zero, because I know so little about Chinese. But, if you don’t get enough opinions, then my super unimportant vote is for #3. When MDBG over-combines, I can easily split it up. But, when it under-combines, I struggle more.

July 11, 2022 at 03:36 AM

3

July 11, 2022 at 07:28 PM

Knowing that 1 is already a feature, I'd vote for 3 as well.

I think option 3 grouping makes the most logical sense to the reader, especially those who are less advanced. Splitting on spaces makes it very clear that 十七八 is to be treated as one unit and that 岁 is to be treated as one unit. If you were to split on hyphens as in example 2, then the user might be mislead to think that 十七 is one unit, 八 is a second unit, and 岁 was a third unit.

Option 4 would also make sense, but I think it's not as friendly to the beginner.

July 12, 2022 at 01:19 PM

I suspect the demand for anything other than 1 would already be pretty niche. Usually the use case for ruby text is when you want to know how each character (or certain rare characters) are pronounced individually, without regard for normal Pinyin word segmentation or punctuation conventions.

July 13, 2022 at 04:00 PM

Thanks! I've been leaning towards #3 anyway so I'm glad to see people chiming in for that one. (but if there are a lot of votes for another option in the beta it'd be easy enough to add then ?

Sign In

Ruby Pinyin Word Grouping

Recommended Posts

mikelove

MTH123

vellocet

laowai-guide

Demonic_Duck

mikelove

Join the conversation