fredrik_w Posted July 10, 2008 at 11:57 AM Report Posted July 10, 2008 at 11:57 AM Does anyone know how to count the amount of unique characters in a document? In Word, I can see how many characters my document contains but I want to know how many unique characters there are. Can I do something with Word or is there any on-line tools that is possible to use. Quote
renzhe Posted July 10, 2008 at 12:15 PM Report Posted July 10, 2008 at 12:15 PM I don't think Word can do it, and I'm not sure if there are online tools for this either (it wouldn't be really practical to paste huge chapters into an online text box). I simply wrote a tiny program which counts the number of unique characters in a unicode document. It's not very user-friendly, though. Quote
lemur Posted July 10, 2008 at 12:24 PM Report Posted July 10, 2008 at 12:24 PM I did something similar to what renzhe is talking about. I usually use OpenOffice to work on Chinese texts. What I did was save the file to plain Unicode text and then I ran a python script on it to compute how many unique characters were present and their frequency and show the results in order of decreasing frequency. Quote
roddy Posted July 10, 2008 at 12:25 PM Report Posted July 10, 2008 at 12:25 PM Try this - doesn't count them so much as give you a list, but assuming you can count . . . Quote
student Posted July 11, 2008 at 10:25 AM Report Posted July 11, 2008 at 10:25 AM You might find this vocabulary profiler to be useful... http://lingua.mtsu.edu/chinese-computing/vp/index.php It can provide total character count, unique character count, frequencies of those characters, whether each character is in the HSK (or other) lists, and also do an analysis of the incidence of bigrams trigrams etc. Quote
fredrik_w Posted July 12, 2008 at 10:49 AM Author Report Posted July 12, 2008 at 10:49 AM Thanks guys, your links have been very helpful. Quote
Sarevok Posted December 2, 2010 at 04:30 AM Report Posted December 2, 2010 at 04:30 AM Those links do not work for me anymore... Is there a way to do that in Excel? I have a huge spreadsheet of more than 50000 rows and I would like to know the total unique character count in particular columns... Quote
jbradfor Posted December 2, 2010 at 04:38 AM Report Posted December 2, 2010 at 04:38 AM Does this link help? Quote
roddy Posted December 2, 2010 at 04:47 AM Report Posted December 2, 2010 at 04:47 AM Assuming you don't have an equally massive number of columns, I'd skip the Excel part of the problem by copying and pasting each column into a text document, and then doing the character count on that. Quote
jbradfor Posted December 2, 2010 at 04:59 AM Report Posted December 2, 2010 at 04:59 AM But he wants unique characters. Quote
roddy Posted December 2, 2010 at 05:06 AM Report Posted December 2, 2010 at 05:06 AM I know, I just didn't type that word. Quote
jbradfor Posted December 2, 2010 at 05:14 AM Report Posted December 2, 2010 at 05:14 AM Word processors count unique characters too? I did not know that.... Quote
roddy Posted December 2, 2010 at 05:28 AM Report Posted December 2, 2010 at 05:28 AM No, but then he's got a bunch of text files he can work with, rather than trying to do a unique character count on specific Excel columns. Quote
HedgePig Posted December 2, 2010 at 09:34 AM Report Posted December 2, 2010 at 09:34 AM I think this Excel macro may help. It basically compares whatever text is highlighted against a "reference" list and then gives a count of each character that occurs, split by whether it's in the reference list or not. If you don't want to compare against "known" characters, then simply delete the reference list. If you are happy running the macro directly, rather than just pressing the analyse button, you simply select the text in your sheet and run the macro via alt+F8. (The macro is called "Main". I apologise!) Otherwise you need to cut and paste into the "Source" sheet of the macro workbook and select the text you want analysed. Hope this helps. Let me know if you have problems. Quote
c_redman Posted December 2, 2010 at 04:34 PM Report Posted December 2, 2010 at 04:34 PM I have a word list application that can give the counts for characters. You just need to uncheck all the dictionaries and known word lists that are checked by default. If it doesn't use its dictionary, it doesn't know how to segment words, so it results in single characters. 2 Quote
HedgePig Posted December 3, 2010 at 01:16 PM Report Posted December 3, 2010 at 01:16 PM That's a really nice app, c_redman! I think I'll be using it a lot more. I particularly like the way it lists idioms as well. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.