Introducing Chinese Text Analyser

May 25, 2014 at 12:14 PM

I'm about to purchase Chinese Text Analyser. I have a pc with Windows, though I often use Linux and maybe in the future I'll buy a macbook air..did you test and confirm that the program runs also on Linux and Mac OSX using Wine? I'd be interested in a native app as well.
I read that the program allows to "Export word lists of known or unknown words for use in SRS or other programs". I'd be interested in creating worlists for Pleco. Is it easy to do? From my understanding I should generate a tab separated list containing these fields: "characters{tab}Pinyin pronunciation{tab]definition" . Pinyin and definition are optional, if not specified they are provided by Pleco itself. If I purchase a license I can use it on multiple PCs, right? And can I get unlimited updates or not? I'm going to purchase the program and try it now.

May 26, 2014 at 09:52 AM

Yes it works on wine under Linux though it has some minor graphical glitches (e.g. images for icons don't always display correctly), and I do testing on Linux through wine before each release. I haven't tried wine on OSX, but I have a MacBook Air myself and do all development and testing on a windows virtual machine through VMWare Fusion.

You can test it for yourself before purchase, as it has a free 14 day trial. Licences will work across OSes and versions, and for personal use can be used on any computer where you are the primary user. This works on an 'honour system' and I don't do any intrusive checking to confirm and you can just copy the licence to whatever computer you like - but the licence files contain enough personal information to discourage sharing openly.

Word lists are trivial to export to pleco. CTA exports tab separated files for a range of fields, including all those supported by pleco, and you simply choose which ones you need.

I recommend downloading the trial and checking it out. If you have any questions about usage, post them here and I'll do my best to answer them.

June 2, 2014 at 01:23 AM

I bought CTA in April but bc I'm in grad school just now getting around to using it and enjoying it immensely. Is there a keyboard shortcut for marking a word as known when it's highlighted in the right hand window unknown words list? Thank you!

June 2, 2014 at 03:21 AM

Not at the moment, but I should probably make that double clickable like the main text view. Will add it to my todo list, and it should be ready for the next version.

June 2, 2014 at 01:02 PM

Ahh, actually, I've just realised that double-clicking a word in the list searches for it in the text. Hmm, might need to think of an appropriate shortcut. Any preferred keystroke?

June 5, 2014 at 03:05 AM

On 2nd thought, i guess a keyboard shortcut is not entirely necessary - it hadn't occurred to me till now that I could use Control or shift to select multiple entries at a time at mark them as known all at once, which is saving me a ton of time. duh!

June 5, 2014 at 04:43 AM

Hopefully as well, as time goes by and Chinese Text Analyser develops a more accurate model of your vocabulary, it will be less and less necessary to do bulk markings.

June 29, 2014 at 12:47 PM

Scanning of just a section of text - this would be useful with large documents. For example, in a current book I just want to scan the first part of chapter 1 (第一篇贡品　1). Would be great if I had a way to select all text between the markers 第一篇贡品　1 and 第一篇贡品　2 and just scan that, without having to copy and paste in a separate document first. Also, if CTA could recognize some common chapter markers, such as those above, and split automatically that would be very convenient.

I was thinking about a more general version of this. For a document, the user could enter a number indicating the average number of lines per page. Then CTA could provide options to export unknown words ordered by their first appearance in a document, grouped by some number of pages, such as:

// pages 1-10

...

// pages 11-20

...

June 29, 2014 at 03:26 PM

Internally, everything in Chinese Text Analyser works on byte offsets from the beginning of the file.

It internally calculates a page size in bytes equal to the total number of bytes on the last visible page (so when you drag the thumb on the scrollbar to the end, it fits perfectly on the last page).

It probably wouldn't be too difficult to export based on this, however while such a page size works great for scrolling the UI, it might not be ideal when exporting text, especially if the last page doesn't have much on it (e.g. one or two characters per line) causing the page size to be a low number of bytes relative to other pages of text.

I could also add something to export from the current position in the file for X bytes/pages.

June 29, 2014 at 03:42 PM

^ Sounds like pages wouldn't necessarily be a good metric, then. Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

// words 1-15

...

// words 16-30

...

Either way, the idea is to give the user chunks of the vocabulary they need to learn in the order they need to learn it, instead of a long, undifferentiated list of unknown words. Perhaps have an option to put low-frequency words into a separate group at the end of the list.

June 29, 2014 at 04:01 PM

Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

Is your meaning that you'd just like to have markers inserted in to the exported file? Otherwise you can sort of do this already, just set:

'Word List': Unknown

'Sort By': First Occurrence (Ascending),

First: N words ordered by 'First Occurrence (Ascending)'

Where N is the number of words you want per group.

Then just make sure to mark exported words as known.

When you open the export dialog again, then because the previous N words are now 'known', the next group of N words will be from the next part of the document with unknown words.

Perhaps have an option to put low-frequency words into a separate group at the end of the list.

Perhaps an option to ignore words below a certain frequency?

June 29, 2014 at 04:34 PM

Is your meaning that you'd just like to have markers inserted in to the exported file?

The idea was to have it break up the list into separate categories for Pleco, but I guess one could use Pleco's Splitting function on the entire list instead.

June 30, 2014 at 02:31 AM

That should be relatively easy to add. Will put it on the todo list.

July 3, 2014 at 02:32 AM

Imron generously gave me a license even though I am rather new to this forum several weeks ago. This review is coming from a beginner-intermediate/lower-intermediate level and I hope it will be of use to some learners.

It is, as advertised, really fast! I loaded several novels in a fraction of a seconds. I do hope that better recognition of names can be done, though, because I do not want to mark character component of names as known if I don't know them well enough. Together with Pleco flashcards, this app has helped me improve my Chinese very quickly. Thank you Imron!

July 3, 2014 at 02:36 AM

FYI, Benny Lewis is recommending your software in his fi3mplus premium package. And I can't say I disagree with him there. While there's a ton of resources out there for pay, I think this one is more than worth the cost.

July 3, 2014 at 03:00 AM

I do hope that better recognition of names can be done

Better name recognition is on my list of improvements for the segmenter, but currently segmenter improvements are low down on the priority list while I get the rest of the application in place. I'll look to see if I can come up with an interim solution - maybe explicitly marking something as a name.

Benny Lewis is recommending your software in his fi3mplus premium package

I'm glad to hear he likes it and thinks it's worth recommending. Do you have a link to anything specific? A quick google search doesn't turn up anything mentioning Chinese Text Analyser.

I think this one is more than worth the cost.

I agree

July 3, 2014 at 04:06 AM

I can't provide a link. Well I can, but it's behind his paywall. I suppose while that's a compliment, but probably not as much exposure as if it were on his regular site or in the non-premium portion of the site. I wouldn't expect you to see much difference in the number of sales. The premium members are more serious about language learning, but only a subset are going to be interested in Chinese.

July 3, 2014 at 04:13 AM

Ok, no problem. I might drop him a line separately.

July 14, 2014 at 01:53 AM

Version 0.99.4 is now released. New features include:

A 'recent files' menu item
Remembering the position in the file for recently opened files
Search history
Improved wordlist management that allows for revision history of wordlists to be stored. This will be expanded upon in future releases with support for multiple wordlists and the ability to restore previous versions of a wordlist.

One annoying bug I've also just spotted is that if you install while the application is running, then it won't install the new executables (even though it will say it is running the newer version, it will still be the older executable). Therefore if you're upgrading, make sure to exit Chinese Text Analyser completely before installing (I'll be addressing this problem properly in the next release).

July 14, 2014 at 03:01 PM

Version 0.99.5 is now out, and fixes the install problem.

Sign In

Introducing Chinese Text Analyser

Recommended Posts

fabiothebest

imron

Xiao Kui

imron

imron

Xiao Kui

imron

character

imron

character

imron

character

imron

DanielW

hedwards

imron

hedwards

imron

imron

imron

Join the conversation