mikelove Posted July 10, 2022 at 05:29 PM Report Posted July 10, 2022 at 05:29 PM Currently at the 'tiny little annoying orthographic bug' phase of Pleco Ruby pinyin support and puzzling over a question I can't seem to find a good authoritative answer for online. If you're applying ruby to Chinese words (rather than individual characters), and the Pinyin reading for a particular word contains a space/hyphen, should the pinyin be broken up on those boundaries or should it be left intact + treated as a single ruby reading? To use an example from pinyin.info, with 十七八岁 shíqī-bā suì, it seems like there are four ways one might apply ruby to those characters: 1) Character by character shí qī bā suì 十 七 八 岁 2) Split on hyphens + spaces shíqī bā suì 十 七 八 岁 3) Split on spaces, but not hyphens shíqī-bā suì 十 七 八 岁 4) Don't split at all, group by the entire entry shíqī-bā suì 十 七 八 岁 We're already offering an option for 1 (since that's obviously the only way to do this with vertical Zhuyin ruby), the question is which one of 2/3/4 we should support in addition to that. The system can already support any of these options (we just tell it which characters to treat as breaks in the pinyin and it does), so none of them are more work for us to implement than any other, it's just about which one we make the easy-to-select default. 1 Quote
MTH123 Posted July 11, 2022 at 01:14 AM Report Posted July 11, 2022 at 01:14 AM If someone like you is struggling with which way to go, then it probably doesn’t matter too much. My opinion should literally count for zero, because I know so little about Chinese. But, if you don’t get enough opinions, then my super unimportant vote is for #3. When MDBG over-combines, I can easily split it up. But, when it under-combines, I struggle more. Quote
laowai-guide Posted July 11, 2022 at 07:28 PM Report Posted July 11, 2022 at 07:28 PM Knowing that 1 is already a feature, I'd vote for 3 as well. I think option 3 grouping makes the most logical sense to the reader, especially those who are less advanced. Splitting on spaces makes it very clear that 十七八 is to be treated as one unit and that 岁 is to be treated as one unit. If you were to split on hyphens as in example 2, then the user might be mislead to think that 十七 is one unit, 八 is a second unit, and 岁 was a third unit. Option 4 would also make sense, but I think it's not as friendly to the beginner. Quote
Demonic_Duck Posted July 12, 2022 at 01:19 PM Report Posted July 12, 2022 at 01:19 PM I suspect the demand for anything other than 1 would already be pretty niche. Usually the use case for ruby text is when you want to know how each character (or certain rare characters) are pronounced individually, without regard for normal Pinyin word segmentation or punctuation conventions. Quote
mikelove Posted July 13, 2022 at 04:00 PM Author Report Posted July 13, 2022 at 04:00 PM Thanks! I've been leaning towards #3 anyway so I'm glad to see people chiming in for that one. (but if there are a lot of votes for another option in the beta it'd be easy enough to add then ? Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.