ilprincipe Posted October 8, 2009 at 01:42 PM Report Share Posted October 8, 2009 at 01:42 PM hello, I have downloaded the HSK dictionary which is good, but in the pinyin column it does not separate syllables, for example, work is pinyin-ed as gongzuo (with the tones)..I actually would like to separate gong and zuo (keeping their tones)... does anyone know how to do this in excel? thanks Quote Link to comment Share on other sites More sharing options...
Xiwang Posted October 8, 2009 at 04:45 PM Report Share Posted October 8, 2009 at 04:45 PM I can't think of an easy way for Excel to break the words into individual syllables. However, it should be possible to write a script that will search for each possible pinyin syllable (since there are a finite number of them) and then insert a space when appropriate. Is there a particular reason why you want to do this? There are rules as to when syllables should be linked together and when they should be separated by spaces, See, http://www.pinyin.info/readings/zyg/rules.html. There are also times when, out of context, a character in a sentence can form a word with either the syllable before it or the syllable after it. Keeping the appropriate syllables together avoids confusion. I used to like to have my syllables separated. However, these days, it annoys me when textbooks separate syllables that should be together. Quote Link to comment Share on other sites More sharing options...
roddy Posted October 8, 2009 at 04:54 PM Report Share Posted October 8, 2009 at 04:54 PM 1. If the pinyin is currently marked as tones, run it through a converter to get numbers (Wenlin will do this, should be something online somewhere). Ie, you want to put in gōngzuò and get out gong1zuo4 2. Do a bunch of search and replaces. '1' > '1 ', '2' > '2 ', etc to put in the spaces. 3. If necessary, run through the converter in the other direction to get the tone marks back. You'll end up with unnecessary spaces at the end of words, but I can't see why that would be a problem. A more sophisticated text editor (actually perhaps just Word if you look for advanced settings) would be able to remove spaces at the end of lines if you need to. As above, I'm not sure why you'd want to do this, but that's how . . . Quote Link to comment Share on other sites More sharing options...
ilprincipe Posted October 8, 2009 at 05:08 PM Author Report Share Posted October 8, 2009 at 05:08 PM hi, thanks for the response. I don't have Wenlin, but the approach you suggested would work, so I will have to look for something else which does the job in both directions, as I would need to get the tones back, of course. The reason why I want to separate syllables is that when doing a search for pinyin you must make a choice: either end up finding 'ying' when you are looking for 'yin' (which are two totally different words) or, if you don't want to get 'ying' in your search results, you must tell the search engine for spot spaces and stop the search there. I prefer not to find all 'ying' or 'shuai' when looking for 'yin' or 'shu' and many other..and therefore, I need to place spaces between pinyin, or pin yin I am happy to do a excel macro, something that go through all the 400 possible pin yin and make the separation, but then I thought..if the macro finds something like yingai, how does it know it is yin gai or ying ai...(I am not sure the example exists in real life, but maybe there would be other ambiguous break-ups...which, if indeed possible, would make a simple software too difficult to program. humm...maybe there aren't that many ambiguous by or tri syllables.. Quote Link to comment Share on other sites More sharing options...
jbradfor Posted October 8, 2009 at 06:19 PM Report Share Posted October 8, 2009 at 06:19 PM Does excel allow regex searches? e.g. search for yin[^g] to get yin but not ying? But I guess if you do a lot of searches that would get old quickly. Quote Link to comment Share on other sites More sharing options...
ilprincipe Posted October 9, 2009 at 05:45 AM Author Report Share Posted October 9, 2009 at 05:45 AM I don't know if Excel does regex searches. I use the data in excel but then use Java to do searches, and Java does do regex. I would be surprised if Visual Basic would not do as well, in which case you can create a VB macro that does the job..but I am not sure about VB searches. Quote Link to comment Share on other sites More sharing options...
m_k_e Posted October 9, 2009 at 07:19 AM Report Share Posted October 9, 2009 at 07:19 AM Sir, VBA ist not a proper programming language, it is a joke. Therefore it does not do something as useful as regexps. That said, how do search your Excel file with Java? Did you hack together something of your own? Why not load the data into a proper DB instead? MySQL supports regex searches. Also, what OS are you on? On Linux, you could simply use grep to do your searches. Say: for@lone:/tmp$ cat > file.txt <> yin > ying > foo > EOF for@lone:/tmp$ cat file.txt yin ying foo for@lone:/tmp$ grep 'yin>' file.txt yin for@lone:/tmp$ grep 'yin' file.txt yin ying Quote Link to comment Share on other sites More sharing options...
ilprincipe Posted October 9, 2009 at 09:53 AM Author Report Share Posted October 9, 2009 at 09:53 AM hello, yes, I have actually developed my own software using Java for practising Chinese characters and do advanced searches. I am just about to complete a new graphic output, which I think is very good, and will let you know how to download it. I have asked this forum before for comments on the software, will do so again once it's been uploaded again..just a couple of days. Quote Link to comment Share on other sites More sharing options...
chinesetools Posted October 9, 2009 at 01:22 PM Report Share Posted October 9, 2009 at 01:22 PM The romanization converter at http://www.mandarintools.com/pyconverter.html can convert between pinyin with tone marks and pinyin with tone numbers. Quote Link to comment Share on other sites More sharing options...
ilprincipe Posted October 10, 2009 at 08:44 AM Author Report Share Posted October 10, 2009 at 08:44 AM thanks, Chinesetools for the link..useful, but still does not separate syllables. by the way, if anyone is interested in an excel macro that takes out the tones, I have it and can post it on request. Quote Link to comment Share on other sites More sharing options...
MadJesta Posted November 4, 2009 at 11:35 PM Report Share Posted November 4, 2009 at 11:35 PM Hi ilprincipe, I realize that this post is a month old; but if you are still stuck.......... I know it's possible to add spaces after tones by breaking-up each word into individual letters and differentiating the letters that have the tones so that you can add the spaces, before putting it all back together again. =Example= - Take a name like lù yǔpíng - Break the letters into separate cells - use LEFT, RIGHT, &. LEN commands. - Differentiate which letters have tones - many ways, a VLOOKUP or an IF statement from a table of tones is one way. - Add a space in front of the letters that require it, using the '&' command e.g. =" "&A1 - Put it all back together again A1&A2&A3 etc I realize that I would need to write a novel here to explain this in any great detail , so there is a working example attached. I hope that this helps someone. By the way - the example is only up to a maximum of 50 characters; you can easily add to this by copy-dragging more lines in to the work area. - Jesta Pinyin2Gapped_Text.xls Quote Link to comment Share on other sites More sharing options...
ilprincipe Posted November 5, 2009 at 04:27 AM Author Report Share Posted November 5, 2009 at 04:27 AM MadJesta, thanks for your reply. however, in the meantime I had found another solution which may be simpler and it does the job well. It assumes, however, that the user have two columns already, one with the pin yin (not-separated) and one with the corresponding Hanzi. This should not be a problem as most character lists do have the hanzi next to it, for example HSK or any other word list. 1) download an entire chinese dictionary, for example CEDIT, which has pin yin already separated 2) do a vlookup function for the Hanzi character across the whole CEDIT (two columns only, Hanzi and pin yin). and it works beautifully. Quote Link to comment Share on other sites More sharing options...
MadJesta Posted November 5, 2009 at 05:13 AM Report Share Posted November 5, 2009 at 05:13 AM Brilliant! Great solution. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.