Jump to content
Chinese-Forums
  • Sign Up

Ruby Pinyin Word Grouping


Recommended Posts

Posted

Currently at the 'tiny little annoying orthographic bug' phase of Pleco Ruby pinyin support and puzzling over a question I can't seem to find a good authoritative answer for online.

 

If you're applying ruby to Chinese words (rather than individual characters), and the Pinyin reading for a particular word contains a space/hyphen, should the pinyin be broken up on those boundaries or should it be left intact + treated as a single ruby reading?

 

To use an example from pinyin.info, with 十七八岁 shíqī-bā suì, it seems like there are four ways one might apply ruby to those characters:

 

1) Character by character

 

shí   qī   bā    suì

十    七   八     岁

 

2) Split on hyphens + spaces

 

  shíqī     bā    suì

十    七    八     岁

 

3) Split on spaces, but not hyphens

 

  shíqī-bā      suì

十   七   八      岁

 

4) Don't split at all, group by the entire entry

 

  shíqī-bā suì

十   七   八   岁

 

We're already offering an option for 1 (since that's obviously the only way to do this with vertical Zhuyin ruby), the question is which one of 2/3/4 we should support in addition to that. The system can already support any of these options (we just tell it which characters to treat as breaks in the pinyin and it does), so none of them are more work for us to implement than any other, it's just about which one we make the easy-to-select default.

  • Like 1
Posted

If someone like you is struggling with which way to go, then it probably doesn’t matter too much. My opinion should literally count for zero, because I know so little about Chinese. But, if you don’t get enough opinions, then my super unimportant vote is for #3. When MDBG over-combines, I can easily split it up. But, when it under-combines, I struggle more.

Posted

Knowing that 1 is already a feature, I'd vote for 3 as well.

 

I think option 3 grouping makes the most logical sense to the reader, especially those who are less advanced. Splitting on spaces makes it very clear that 十七八 is to be treated as one unit and that  岁 is to be treated as one unit. If you were to split on hyphens as in example 2, then the user might be mislead to think that 十七 is one unit, 八 is a second unit, and 岁 was a third unit.

 

Option 4 would also make sense, but I think it's not as friendly to the beginner.

 

 

Posted

I suspect the demand for anything other than 1 would already be pretty niche. Usually the use case for ruby text is when you want to know how each character (or certain rare characters) are pronounced individually, without regard for normal Pinyin word segmentation or punctuation conventions.

Posted

Thanks! I've been leaning towards #3 anyway so I'm glad to see people chiming in for that one. (but if there are a lot of votes for another option in the beta it'd be easy enough to add then ?

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...