Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


imron

Recommended Posts

I'm about to purchase Chinese Text Analyser. I have a pc with Windows, though I often use Linux and maybe in the future I'll buy a macbook air..did you test and confirm that the program runs also on Linux and Mac OSX using Wine? I'd be interested in a native app as well.
I read that the program allows to  "Export word lists of known or unknown words for use in SRS or other programs". I'd be interested in creating worlists for Pleco. Is it easy to do? From my understanding I should generate a tab separated list containing these fields: "characters{tab}Pinyin pronunciation{tab]definition" . Pinyin and definition are optional, if not specified they are provided by Pleco itself. If I purchase a license I can use it on multiple PCs, right? And can I get unlimited updates or not? I'm going to purchase the program and try it now.

Link to comment
Share on other sites

Yes it works on wine under Linux though it has some minor graphical glitches (e.g. images for icons don't always display correctly), and I do testing on Linux through wine before each release. I haven't tried wine on OSX, but I have a MacBook Air myself and do all development and testing on a windows virtual machine through VMWare Fusion.

You can test it for yourself before purchase, as it has a free 14 day trial. Licences will work across OSes and versions, and for personal use can be used on any computer where you are the primary user. This works on an 'honour system' and I don't do any intrusive checking to confirm and you can just copy the licence to whatever computer you like - but the licence files contain enough personal information to discourage sharing openly.

Word lists are trivial to export to pleco. CTA exports tab separated files for a range of fields, including all those supported by pleco, and you simply choose which ones you need.

I recommend downloading the trial and checking it out. If you have any questions about usage, post them here and I'll do my best to answer them.

  • Like 4
Link to comment
Share on other sites

I bought CTA in April but bc I'm in grad school just now getting around to using it and enjoying it immensely.  Is there a keyboard shortcut for marking a word as known when it's highlighted in the right hand window unknown words list? Thank you!

Link to comment
Share on other sites

On 2nd thought, i guess a keyboard shortcut is not entirely necessary  - it hadn't occurred to me till now that I could use Control or shift to select multiple entries at a time at mark them as known all at once, which is saving me a ton of time. duh! :)

  • Like 1
Link to comment
Share on other sites

  • 4 weeks later...

Scanning of just a section of text - this would be useful with large documents. For example, in a current book I just want to scan the first part of chapter 1 (第一篇 贡品 1). Would be great if I had a way to select all text between the markers 第一篇 贡品 1 and 第一篇 贡品 2 and just scan that, without having to copy and paste in a separate document first. Also, if CTA could recognize some common chapter markers, such as those above, and split automatically that would be very convenient.

I was thinking about a more general version of this. For a document, the user could enter a number indicating the average number of lines per page. Then CTA could provide options to export unknown words ordered by their first appearance in a document, grouped by some number of pages, such as:

// pages 1-10

...

// pages 11-20

...

Link to comment
Share on other sites

Internally, everything in Chinese Text Analyser works on byte offsets from the beginning of the file.

 

It internally calculates a page size in bytes equal to the total number of bytes on the last visible page (so when you drag the thumb on the scrollbar to the end, it fits perfectly on the last page).

 

It probably wouldn't be too difficult to export based on this, however while such a page size works great for scrolling the UI, it might not be ideal when exporting text, especially if the last page doesn't have much on it (e.g. one or two characters per line) causing the page size to be a low number of bytes relative to other pages of text.

 

I could also add something to export from the current position in the file for X bytes/pages.

Link to comment
Share on other sites

^ Sounds like pages wouldn't necessarily be a good metric, then. Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

// words 1-15

...

// words 16-30

...

Either way, the idea is to give the user chunks of the vocabulary they need to learn in the order they need to learn it, instead of a long, undifferentiated list of unknown words. Perhaps have an option to put low-frequency words into a separate group at the end of the list.

Link to comment
Share on other sites

Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

Is your meaning that you'd just like to have markers inserted in to the exported file?  Otherwise you can sort of do this already, just set:

 

'Word List': Unknown

'Sort By': First Occurrence (Ascending),

First: N words ordered by 'First Occurrence (Ascending)'

 

Where N is the number of words you want per group.

 

Then just make sure to mark exported words as known.

 

When you open the export dialog again, then because the previous N words are now 'known', the next group of N words will be from the next part of the document with unknown words.

 

Perhaps have an option to put low-frequency words into a separate group at the end of the list.

Perhaps an option to ignore words below a certain frequency?

Link to comment
Share on other sites

Is your meaning that you'd just like to have markers inserted in to the exported file?

The idea was to have it break up the list into separate categories for Pleco, but I guess one could use Pleco's Splitting function on the entire list instead.
Link to comment
Share on other sites

Imron generously gave me a license even though I am rather new to this forum several weeks ago. This review is coming from a beginner-intermediate/lower-intermediate level and I hope it will be of use to some learners. 

It is, as advertised, really fast! I loaded several novels in a fraction of a seconds. I do hope that better recognition of names can be done, though, because I do not want to mark character component of names as known if I don't know them well enough. Together with Pleco flashcards, this app has helped me improve my Chinese very quickly. Thank you Imron!

Link to comment
Share on other sites

I do hope that better recognition of names can be done

Better name recognition is on my list of improvements for the segmenter, but currently segmenter improvements are low down on the priority list while I get the rest of the application in place.  I'll look to see if I can come up with an interim solution - maybe explicitly marking something as a name.

 

Benny Lewis is recommending your software in his fi3mplus premium package

I'm glad to hear he likes it and thinks it's worth recommending. Do you have a link to anything specific? A quick google search doesn't turn up anything mentioning Chinese Text Analyser.

 

I think this one is more than worth the cost.

I agree :mrgreen:

Link to comment
Share on other sites

I can't provide a link. Well I can, but it's behind his paywall. I suppose while that's a compliment, but probably not as much exposure as if it were on his regular site or in the non-premium portion of the site.  I wouldn't expect you to see much difference in the number of sales. The premium members are more serious about language learning, but only a subset are going to be interested in Chinese.

Link to comment
Share on other sites

  • 2 weeks later...

Version 0.99.4 is now released.  New features include:

A 'recent files' menu item
Remembering the position in the file for recently opened files
Search history
Improved wordlist management that allows for revision history of wordlists to be stored.  This will be expanded upon in future releases with support for multiple wordlists and the ability to restore previous versions of a wordlist.

One annoying bug I've also just spotted is that if you install while the application is running, then it won't install the new executables (even though it will say it is running the newer version, it will still be the older executable).  Therefore if you're upgrading, make sure to exit Chinese Text Analyser completely before installing (I'll be addressing this problem properly in the next release).

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...