Jump to content
Chinese-Forums
  • Sign Up

How to have a list of unique characters in a document?


Recommended Posts

  • New Members
Posted

I have Dim Sum Chinese Tools, it's possibile to have a list of all the "unique characters" in a document, but it's impossible to copy it (to transfer it in a word/excel document) or stamp it :-(. Does anybody know a program/method to do that? I'm trying to use http://lingua.mtsu.edu/chinese-computing/vp/index.php?CNTEXT_Session=7e955985e909f25ad52ee49d05b783e6 but it doesn't work with very long documents... :-( Thank you!

Posted

Hi marcoesposito,

I also quickly made one for you with C#, as your basic demand, input a txt file with Chinese encoding support(like unicode, utf-8 ), and press count button, output is a csv file, you can open to read with EXCEL.

get attached zip file, unzip and remove suffix name - '.removeme', then run it with window ui

Hope this can help

LaoJian

WindowsFormsApplication1.exe.removeme.zip

Posted

If you have Word and Excel, it's easy to do. First, paste the text into Word. Go to "Find and Replace" and put these settings:

Save.png

That will put each character onto one line. From there you copy and paste it into Excel. With the data still selected in Excel, do an advanced filter like in the image below and when you click "OK" you'll have a list of all the unique characters in a document.

filter.png

ALSO, I built a free tool a while ago that gives you unique words in a document (with a bunch of other killer stuff):

Trevor's Chinese Reader

You just have to paste in the text and press the button. Then go to Tools --> Download Word List. It's pretty neat.

TrevorsChineseReader.png

Here is what the output looks like when you open it in Excel. Gives you some interesting information to direct your study.

TCR_WordListOutput.png

Best of luck with all your studies!

  • Like 1
Posted

T-Revor, congratulations, your reader is a wonderful tool. It works very smoothly. No doubt I'll be using it in the future. Thanks for making it available.

I suppose the csv file is encoded in UTF-8? I have an oldish version of Excel (2002?) which, apparently, does not understand UTF-8. Do you know how I can force Excel to properly display the file I downloaded?

Another question: I suppose the HSK classification you use (1-4) corresponds to the old HSK?

Posted

Yes, unfortunately, the reader is in bad need of my attention, but I'm trying to finish up some other projects before I get back to it. The HSK is the old HSK unfortunately.

As for the UTF8 issue, I've heard people have had problems with it but I don't know if it's an Excel verison thing or if it's a OS setting, or an Excel setting or what. Again, something that badly needs my attention.

If you have any insight as to why it's not showing up or how I would fix it, please let me know. I'll move it up on my priority list and see what I can do.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...