Jump to content
Chinese-Forums
  • Sign Up

How to count unique characters in a document?


Recommended Posts

Posted

Does anyone know how to count the amount of unique characters in a document?

In Word, I can see how many characters my document contains but I want to know how many unique characters there are. Can I do something with Word or is there any on-line tools that is possible to use.

Posted

I don't think Word can do it, and I'm not sure if there are online tools for this either (it wouldn't be really practical to paste huge chapters into an online text box).

I simply wrote a tiny program which counts the number of unique characters in a unicode document. It's not very user-friendly, though.

Posted

I did something similar to what renzhe is talking about. I usually use OpenOffice to work on Chinese texts. What I did was save the file to plain Unicode text and then I ran a python script on it to compute how many unique characters were present and their frequency and show the results in order of decreasing frequency.

Posted

Try this - doesn't count them so much as give you a list, but assuming you can count . . .

Posted

You might find this vocabulary profiler to be useful...

http://lingua.mtsu.edu/chinese-computing/vp/index.php

It can provide total character count, unique character count, frequencies of those characters, whether each character is in the HSK (or other) lists, and also do an analysis of the incidence of bigrams trigrams etc.

  • 2 years later...
Posted

Those links do not work for me anymore...

Is there a way to do that in Excel? I have a huge spreadsheet of more than 50000 rows and I would like to know the total unique character count in particular columns...

Posted

Assuming you don't have an equally massive number of columns, I'd skip the Excel part of the problem by copying and pasting each column into a text document, and then doing the character count on that.

Posted

No, but then he's got a bunch of text files he can work with, rather than trying to do a unique character count on specific Excel columns.

Posted

I think this Excel macro may help.

It basically compares whatever text is highlighted against a "reference" list and then gives a count of each character that occurs, split by whether it's in the reference list or not. If you don't want to compare against "known" characters, then simply delete the reference list.

If you are happy running the macro directly, rather than just pressing the analyse button, you simply select the text in your sheet and run the macro via alt+F8. (The macro is called "Main". I apologise!) Otherwise you need to cut and paste into the "Source" sheet of the macro workbook and select the text you want analysed.

Hope this helps. Let me know if you have problems.

Posted

I have a word list application that can give the counts for characters. You just need to uncheck all the dictionaries and known word lists that are checked by default. If it doesn't use its dictionary, it doesn't know how to segment words, so it results in single characters.

  • Like 2
Posted

That's a really nice app, c_redman! I think I'll be using it a lot more.

I particularly like the way it lists idioms as well.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...