New Members marcoesposito Posted February 14, 2012 at 12:57 PM New Members Report Posted February 14, 2012 at 12:57 PM I have Dim Sum Chinese Tools, it's possibile to have a list of all the "unique characters" in a document, but it's impossible to copy it (to transfer it in a word/excel document) or stamp it :-(. Does anybody know a program/method to do that? I'm trying to use http://lingua.mtsu.edu/chinese-computing/vp/index.php?CNTEXT_Session=7e955985e909f25ad52ee49d05b783e6 but it doesn't work with very long documents... :-( Thank you! Quote
Gleaves Posted February 14, 2012 at 10:25 PM Report Posted February 14, 2012 at 10:25 PM Take a look at the end of this thread. C_redman's tool might help. http://www.chinese-f...-in-a-document/ Quote
Silent Posted February 14, 2012 at 11:17 PM Report Posted February 14, 2012 at 11:17 PM An offline version can be found here: http://www.chinese-forums.com/index.php?/topic/34994-new-tool-for-vocabulary-extraction/page__view__findpost__p__260571 Quote
LaoJian Posted February 15, 2012 at 03:27 AM Report Posted February 15, 2012 at 03:27 AM Hi marcoesposito, I also quickly made one for you with C#, as your basic demand, input a txt file with Chinese encoding support(like unicode, utf-8 ), and press count button, output is a csv file, you can open to read with EXCEL. get attached zip file, unzip and remove suffix name - '.removeme', then run it with window ui Hope this can help LaoJian WindowsFormsApplication1.exe.removeme.zip Quote
T-revor Posted February 15, 2012 at 07:32 AM Report Posted February 15, 2012 at 07:32 AM If you have Word and Excel, it's easy to do. First, paste the text into Word. Go to "Find and Replace" and put these settings: That will put each character onto one line. From there you copy and paste it into Excel. With the data still selected in Excel, do an advanced filter like in the image below and when you click "OK" you'll have a list of all the unique characters in a document. ALSO, I built a free tool a while ago that gives you unique words in a document (with a bunch of other killer stuff): Trevor's Chinese Reader You just have to paste in the text and press the button. Then go to Tools --> Download Word List. It's pretty neat. Here is what the output looks like when you open it in Excel. Gives you some interesting information to direct your study. Best of luck with all your studies! 1 Quote
laurenth Posted February 15, 2012 at 03:07 PM Report Posted February 15, 2012 at 03:07 PM T-Revor, congratulations, your reader is a wonderful tool. It works very smoothly. No doubt I'll be using it in the future. Thanks for making it available. I suppose the csv file is encoded in UTF-8? I have an oldish version of Excel (2002?) which, apparently, does not understand UTF-8. Do you know how I can force Excel to properly display the file I downloaded? Another question: I suppose the HSK classification you use (1-4) corresponds to the old HSK? Quote
T-revor Posted February 15, 2012 at 11:03 PM Report Posted February 15, 2012 at 11:03 PM Yes, unfortunately, the reader is in bad need of my attention, but I'm trying to finish up some other projects before I get back to it. The HSK is the old HSK unfortunately. As for the UTF8 issue, I've heard people have had problems with it but I don't know if it's an Excel verison thing or if it's a OS setting, or an Excel setting or what. Again, something that badly needs my attention. If you have any insight as to why it's not showing up or how I would fix it, please let me know. I'll move it up on my priority list and see what I can do. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.