Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


Recommended Posts

  • 3 weeks later...
Posted

While playing with word lists, sometimes they just empty themselves.

 

I think I can avoid it by not creating more than one word list in a 'session' i.e. closing and restarting CTA before adding the second word list.

 

Steps:

1. Manage Word Lists -> New -> Test1
2. Paste 小狗
3. Mark entire document known
4. Manage Word Lists -> New -> Test2
5. Paste 小猫
6. Mark entire document known
7. Close CTA (do not save changes)
8. Open CTA
9. Manage Word Lists -> Test2: Preview shows 小猫
10. Manage Word Lists -> Test1: Preview is empty !?

 

Edit: I think the problem extends not just to when you create more than one word list in a session, but also to changing the contents of more than one word list in a session (it forgets the changes to the first list and only remembers the changes to the second).

Posted

Thanks.  I can reproduce the problem and will have it fixed in the next release.

 

It should be saving the contents of each list to disk each time the user changes the word list.

  • Like 1
Posted

Hello Imron, this thread is getting very long, so I'm sorry if my question has already been asked and answered. 

 

I installed CTA on a USB stick thinking I would hop from one machine to another, but in reality I've always used it on the same computer. Now I've just upgraded to the latest version of CTA and plugged my USB stick in a new machine and I'm asked for the license code. Does that mean that, at one point, config and data files were in fact copied on the hard drive of computer #1? If so, what can I do to be able to use CTA on a new machine, while keeping know words and, obviously, the license code?

 

Thanks!

 

[Edit] Of course, I just checked and saw that I do have files on my  hard drive in AppData\Local\ChineseTextAnalyser 

I'll try to just copy them on my USB stick and see what happens.

Posted

@laurenth, there's a little bit more involved to get a standalone version running, and the standalone version currently only works on Windows.

 

See here for details.

 

Because you haven't done this already, it means that your licence and all your existing vocabulary is still stored on your original machine rather than on the USB stick.

 

You should be able to copy everything over though.  See this post for details.

 

  • Like 1
Posted

No question, I just want to thank you for creating this. Great tool. 

 

I'd love to see a feature that could bookmark places for longer reads (I cant get through a 10 page article in one sitting).

 

 

  • Like 2
Posted

Custom bookmarks are on my list of things to do, however CTA should be saving your current position each time you open and close the same file.  If this is not happening, please let me know (preferably with a copy of the file that this is happening with).

 

Great tool

Spread the word.  The more people pay for it, the more I'll be able to spend time improving it!

  • Like 1
Posted

Imron,

 

One small suggestion. Have you thought of adding screenshots, color, or some simple marketing words to the website? At present, the CTA website is a bit, erm, functional. 

 

 

Quote

Spread the word.  The more people pay for it, the more I'll be able to spend time improving it!

 

I say this only because I have recommended CTA to a few Chinese learners. They were either scared away by the 18-page forum thread, or unimpressed by the website. But when they saw me actually using the program, they were super impressed.

  • Like 1
Posted

Here's an example of what I mean: https://getcoldturkey.com/

 

Nice website, simple presentation. Draws your eyes toward the download button, but in a non-offensive way. Marketing language (not overdone). Product clearly explained. Good color scheme.

Posted
5 hours ago, murrayjames said:

Have you thought of adding screenshots, color, or some simple marketing words to the website?

I have!  Like with all things though, it's a problem of time, money and priority and updating the website falls behind due to that.

Posted

Imron,

 

I just had a thought. It would take some time (maybe a lot of time), but it could potentially solve the money problem. Have you thought of making an English Text Analyser? As fast, trimmed down, and powerful as CTA... Interface languages and documentation in languages other than English might take some work. And there would be dictionaries to procure, though I bet finding an open source X-->English dictionary would not be that hard. Word segmentation would be a much smaller problem.

 

The upside is that the English language learning market is huge. ($$$)  I've had several Chinese friends watch me use CTA and ask if there was a similar thing in English. (The only comparable program I know of is LWT, but it is much slower and requires setting up a local server). I suspect English-language learners from many different countries would be interested in such a thing.

  • Like 2
Posted

I think murrayjames' idea is a good one. I too have thought about how it would be a good idea to make it usable for other languages... It would make word segmentation become a non-problem (as murrayjames said), and all of the other features of CTA would still be just as useful. I'd definitely use it for Spanish! 

Posted

Yes, it's something @imron could consider if he's got time. Segmentation is particularly useful for Chinese language because there aren't spaces between words. Anyway the ability to track known and unknown words and generate wordlists can be useful for any language.

 

p.s. I'm Italian and know enough also about latin and greek and this allows me to learn many words of European languages without much effort (see them once and remember them, I wish it could be like this also for Chinese), but I realize it may be not easy for those who don't have my background. Chinese text analyzer was clearly created for learners of Chinese, hence the name (probably also for personal use and evolved into a real product). It could be especially useful for languages that are very different from your own language. I'm thinking that as the program helps foreigners learn Chinese, another version of the program could help Chinese people (1.4 billion people) learn English for example. From the point of view of a Chinese person, English is easier than Chinese in the long run, because it has an alphabet at least, anyway at the beginning it can be quite difficult for Chinese people. This is just an example.

Then obviously apart from languages that use letters, it would apply well to other languages like Japanese, Korean, Thai, Hindi, Arabic, in a similar fashion to Chinese. Anyway it would require interest from the developer, time to study the feasibility and time to maintain such product.

Posted

I'd buy an English version, just to support Imron and the work he's doing  :tong

  • Like 1
Posted

You realise you can just buy more licences for the Chinese version any time you want right? ;-)

 

In any case, edit the file c:\users\<username>\AppData\Local\ChineseTextAnalyser\data\config

 

And scroll down to the [general] tag and change

 

chineseOnly = true

to

chineseOnly = false

 

You'll now get English words in the frequency list.  It's not perfect because apostrophes and accents will split a word. It also doesn't have a dictionary.

 

I've thought quite a bit about having a multi-language version, and support for different segmenters, and the code has been designed specifically to support this, and actually I already have a space based segmenter and a character based segmenter (just not in the shipped version, and not quite 100% ready for production).  It's something that will come eventually.

 

Dictionaries are the bigger issue and are something of a chicken and egg problem.  Licensing costs are prohibitive with CTA's current number of users.  I've thought about adding support for StarDict, but many (most?) StarDict dictionaries appear to be pirated and I don't particularly want to facilitate that.

 

If anyone knows of any freely available dictionaries for other languages, let me know and it's probably not to difficult to add support for them e.g. see earlier in this thread the discussion about the cantonese dictionary.

  • Like 1
Posted

I don't know much about commercial dictionaries. From Mike Love's posts over the years, they seem like a lot of work (and expensive) to secure.

 

Is there an E-C equivalent of CEDICT? :D ECDICT?

 

Surely there must be an open source English-English dictionary available. Does the Wiktionary license allow for commercial use?

 

EDIT: How about Princeton University's WordNet? http://wordnet.princeton.edu/wordnet/license/

  • Like 1
Posted

Hi Imron,

 

Can the way that the custom Cantonese dictionary was added in the past be used to add and custom English-English dictionary? 

 

If so, can you point me to the post of how to do this? I found a few mentioning custom dictionaries by searching. For example: 

On 3/27/2016 at 8:41 PM, imron said:

Custom dictionaries are now supported as described in this post

 

But the link seems broken...

 

 

Also, in terms of open source dictionaries to use, what about Wiktionary? If you go here and scroll to the bottom (in the footer), you see:

 

Quote

 

Posted

This is the post.  Basically it's just cedict format, and then put it in CTA's user data directory on your hard drive.  You can also make it so that only the custom dictionary is used as per this post.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...