Introducing Chinese Text Analyser

February 13, 2016 at 02:35 AM

Thanks for the feedback. I'll look in to it.

Ok, I just did a quick test on Win10, and the font dialog is still working for me.

Would you mind sending me the log files? You can do that as follows:

1) Close CTA if you have it open, and then restart it (this will reset the log file, so it'll be a nice clean log)

2) Open a file containing Chinese text

3) Select Format->Font from the menu. If possible change a font.

4) Press Ctrl-Shift-F to bring up the font dialog. If possible change a font.

5) Press Ctrl-+ several times to increase the font size

6) Press Ctrl-- several times to decrease the font size

7) Press Ctrl-0 to reset the font to the default size.

Select Help->Send Feedback

9) Check 'attach log'

10) Type a quick message

11) Hit send.

Please make sure to do all actions 3-7, even if they appear to have no effect because that 'no effect' should hopefully be writing errors to the log file, which I'll then use to try and figure out what's going wrong.

February 13, 2016 at 02:36 AM

should I uninstall the version I'm using before installing the new one?

No, but for anyone who has made manual edits to their dictionary file, or any other file in c:\Program Files\ChineseTextAnalyser\data make sure you make a backup copy before installing, because the installer will overwrite anything that is there.

February 13, 2016 at 02:41 AM

OK feedback sent!

February 13, 2016 at 02:47 AM

Also, congratulations on the OS X version

Thanks. A lot of work has gone in to things that most people probably won't consciously notice but that would detract from the experience if not there. I'm particularly happy with the speed of the main text view, compared to the standard OS X text view which gets sluggish if you have lots of alternating colours in the text (which is a pretty typical use case for CTA if say someone only understand 50% of the words in a text).

February 13, 2016 at 03:00 AM

Imron,

I didn't backup C:\Program Files\ChineseTextAnalyser\data

But... I followed your instructions in this post to the letter! So I still have my custom dictionary entries. They are in a file called cedict_ts.u8 in username\AppData\Local\ChineseTextAnalyser\data.

To get my entries back in CTA, I append the contents of this file to C:\Program Files\ChineseTextAnalyser\data\cedict_ts.u8. Is that right?

February 13, 2016 at 03:10 AM

Imron, thank you so much for making an OS X Version! It's amazing and I'm impressed that you even added in split-screen support. I would like to report a bug, though (hope it is helpful). I tried to import an ePub and it caused it to crash. That's fine - but the problem is CTA forgets the words I have imported. I have reproduced this many times - if I cause CTA to crash by importing an unsupported document, it will forget all the words I know. Also, the splash screen is not retina-ready and it forgets the documents I opened the last time. Is this on purpose?

I am really, really impressed with the quality of CTA on OS X. Thank you again!

February 13, 2016 at 04:42 AM

@murrayjames, yes, that's the way to do it.

@DanielW, newly added words are only saved at the end of a session (when you close the app). This unfortunately will not be happening if the app crashes. Can you send me a copy of the file that is causing the crash so I can try and reproduce (and fix) the problem.

I'm impressed that you even added in split-screen support.

I'm impressed even more, because I didn't even know I did that, and in fact didn't even know what that was until just now (I'm still running Mavericks).

Forgetting the documents you had opened last time is not necessarily on purpose, more something I overlooked. I'll sort that out (and the retina splash screen) in the next version.

I am really, really impressed with the quality of CTA on OS X

Here's something fun to try. Open up TextEdit, resize the window so it takes up a large part of the screen, and then type 一二 and change the colour of one of them to red e.g. 一二

Now copy and paste those two characters until you have a line full of red and black

一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二一二

and then copy and paste that line until you have a page full of red and black alternating colours. Now try resizing the TextEdit window - you'll probably find it painfully slow.

Then copy and paste the entire thing in to CTA and double click on one of the chars, which will toggle known/unknown and give you the same sort of alternating colours. Resize the CTA window around and notice the difference

February 13, 2016 at 05:01 AM

@Imron,

Something's really odd. I reproduced the error with the file - https://dl.dropboxusercontent.com/u/51668103/%28epub%20for%20general%20device%29%20hpmor%20-%20Eliezer%20Yudkowsky%20-%20chs%20-%200.6.epub multiple times last night but it successfully raises the error, "Error converting the document" today. With this document, when it crashed yesterday, it lost not only the data since I last opened the app, but all the known words data in CTA (oddly enough, it remembered the recently opened files, though).

Yesterday, I tested it by opening CTA, importing HSK 5, marking a few words as known, and then closing CTA. I then opened CTA, checked that the words were still marked as known, then closed it again. Next, I opened CTA, opened the "epub for general device hpmor" epub, then it immediately crashed. I opened CTA yet again and it prompted me to import word lists (including HSK) again. I opened several files and it indeed had forgotten all the words I had marked as known.

I did reproduce the crash today with another file: https://dl.dropboxusercontent.com/u/51668103/%E6%B4%BB%E7%9D%80.epub. Perhaps the encoding was different. It demonstrates two behaviours, both of which I reproduced just now: the first behaviour is crashing while opening the file. I'm not sure but it seems that it can forget some words I had marked as known. (percentage unknown for a document went up from 28% to 29%? Really unsure about this)

The second behaviour it displayed was showing weird characters (like question marks and black diamond-shaped symbols) (encoding?). It did not crash until I scrolled down. Oddly enough, it still remembered the words I had imported (or did it forget a few? Not sure).

Hope this helps, and thank you!

OS X version: 10.11.3

Macbook: MacBook Pro (Retina, 13-inch, Late 2013)

February 13, 2016 at 06:33 AM

Thanks for that DanielW.

Chinese Text Analyser only supports text files. ePub and other document formats such as PDF or Pages files are not yet supported, but that being the case, it's not ok that they crash the program so that's something I'll try to address in a new release.

The second behaviour it displayed was showing weird characters (like question marks and black diamond-shaped symbols)

That's to be expected from an ePub file, which contains binary data instead of just text, and while supporting the ePub format is not likely to happen soon, stopping it from crashing when opening any sort of file is definitely a high priority.

With regards to losing marked words, the program currently only saves these words upon successful exit. If the program crashes it will not be saving them, and so any changes will be lost. In an ideal world, the program will not crash :-) I will also look at saving a temporary copy of any changed words to disk, so that if the program can recover them in the event of a crash.

February 16, 2016 at 07:55 AM

Ok, version 0.99.11 is now out.

This version should fix the crash bug when opening binary files on OS X. There will still be diamonds, but at least it won't crash out. As a workaround for getting Chinese text rather than diamonds from binary files such as epub, pdf and pages documents, open them first in another program and then copy/paste the text in to CTA.

This new version also:

* fixes a problem with the font dialog box on Windows when the user has an invalid font cache (thanks murrayjames!)

* has retina graphics for the splash screen on OS X

* saves the wordlist to disk after importing

* has a menu option to enable/disable the highlighting of looked up words (Format->Colour Scheme->Highlight looked up words)

February 16, 2016 at 07:52 PM

I use this programme all the time and would hate to be without it. I'm really grateful it exists. But I still deeply dislike and resent it. This is solely because it won't let me add looked up words as 'known'. I don't use it as a reader but as a text analyser. I appreciate that among those people who do use it as a reader, certain of them will be weak-willed and not to be trusted with this functionality in case they abuse it and ultimately degrade any benefit they get from using the programme as a reader. So I can accept that, in order to help them, I must endure frustration. I presume it's not possible to hack into and modify software like this, though I'm sorely tempted to ask a friend if he can.

Edit: please excuse the rant!

February 17, 2016 at 05:16 AM

realmayo, I know this is a point of contention for you, but one of the main features of CTA is to be able to give you a good idea of how well you will understand a given piece of text.

In order to do that, it needs to have an accurate model of your vocabulary, and looking up a word in the dictionary is an implicit acknowledgement that you don't know a word well enough yet, and that will have an effect on how well you will be able to understand a piece of text. CTA is just reflecting the reality of the situation.

Anyway, attached is a file containing a colour scheme for CTA that will keep looked up words as the same colour as 'known' words. Unzip it, and copy the file default.colours to the folder: c:\users\<your username>\AppData\Local\ChineseTextAnalyser\colour-schemes and you should be set.

Then, when you export wordlists, just check 'mark exported words as known' and the problem you have will largely go away. If you don't actually want to export anything, press ctrl-shift-c (which will export to the clipboard) and then just don't do anything with the result.

default.colours.zip

February 17, 2016 at 05:27 AM

Maybe a pop-up warning like "I understand that by exporting this as known/claiming this is known... etc" and OK + cancel buttons as well as a check box "don't show this warning again" would make people happy. Your philosophy may be correct, but people will want to use this tool as they see fit. You may have to sacrifice your ideals for sales and happy customers.

After typing that, I realize this discussion has probably occurred several times over the past 17 pages.

February 17, 2016 at 05:54 AM

Maybe a pop-up warning like "I understand that by exporting this as known/claiming this is known... etc" and OK + cancel buttons as well as a check box "don't show this warning again" would make people happy

The point is not about making people happy, it's about developing an accurate model of the user's vocabulary and showing users where they are relying on a dictionary.

One of the primary goals of CTA is to help people make the jump to be able read text without aid - ideally without using CTA at all, but from native books, websites and other sources.

One of the ways it does that is to make it explicitly clear when you have relied on an aid (e.g. the dictionary), because as mentioned above looking up a word in the dictionary is an implicit acknowledgement that you didn't know that word well enough yet. You might not like that fact, but that's still the truth of the situation and CTA is just shining light on that truth, rather than keeping you in the dark about it.

You may have to sacrifice your ideals for sales and happy customers.

The alternative is to sacrifice sales and happy customers for ideals, and unfortunately for my bank account, I am an idealist

I'm aware it's a controversial feature, but it's also a core feature.

February 17, 2016 at 07:52 AM

I still think the product should be renamed from 'analyser' to 'reader'. I never use it to read texts. If I want a reader, I'll use Pleco's. But then, I'm not using it the way it's intended (just the way it's marketed ). So I'm in no position to complain.

As for ideals, perhaps it's also an aspiration, that this tool will be something of a game-changer for lots and lots of people, but only if they use it as a strict reader. So I can see why you don't want the possibility of it being used the wrong way.

one of the main features of CTA is to be able to give you a good idea of how well you will understand a given piece of text.... CTA is just reflecting the reality of the situation

Again, that makes sense if used the way you want it to be used. Me, I'm constantly deleting and re-inserting into CTA a list of my known words so the list of known words that CTA stores is only ever temporary, and will change all the time depending on what words I add to flashcards, and what word I forget (and thereby instantly & automatically leech) in flashcards. Or indeed my list of 'known words' will only be a list of 200 recently learned words and I'm looking to extract any sentences in which they might occur in the next chapter of a book I'm reading, something like that. I've found that really useful for reinforcing. I mention that just to point out that I dislike the core feature of CTA even while I understand the rationale behind it.

February 17, 2016 at 08:08 AM

Ok, so the real problem is that CTA only has one list of 'known' words, which makes it difficult when you want to juggle multiple lists.

The underlying support for multiple wordlists is actually already in the main CTA code and at some point I'll get the time to make it so that users can easily manage multiple wordlists from within the application itself. That will hopefully make things much more usable for you.

February 17, 2016 at 08:27 AM

Also, in addition to your other point, actually, CTA's main purpose is not as a reader.

For me, it's about being able to identify how well you'll be able to understand a (possibly large) piece of text and about being able to prioritise and assist in learning unknown words within that text. If you look on the CTA homepage, then you'll see that this is exactly how the program is introduced.

That being the case, if by your explicit or implicit actions you tell CTA that you don't know a word, then it is going to use that information to help with the above two things.

February 19, 2016 at 06:22 AM

I think you should be more open to people using your software in ways you hadn't intended, but it is your business, not mine.

I haven't run into this problem often myself because I'm not at the stage where I will be using CTA extensively. However, a few weeks ago I was using CTA to determine how many words I would know after I had completed several texts and how close I could be to reading native material. I built a list of known words using those texts and compared them to a couple of novels to get a rough idea how far I would be from reading native material when I was done. Since I had been messing around with the program earlier, there were a bunch of words that I couldn't add to this hypothetical word list unless I exited the program, restarted it, and pasted my wordlist back in. Then I was playing around with the overall% of words known vs unique% of words known and adding the highest frequency words from the novel to my known list to see how high I could get my overall% (and I found that just learning around 200 new words in this book would bring my knowledge of all words to around 94% even though my knowledge of the unique% of words would only be around 50%). And again, out of curiosity I checked a few definitions. So yeah, it was a bit annoying, but I wasn't using your program for its main purpose. I was creating hypothetical known wordlists and comparing them against a text. I know I don't know these words! It was just a way to look into the future and also see if the sources I was learning from were high yield (if you get through the DeFrancis series, you'll learn 6800 words -- but a huge portion of those words are very low yield in the novels I was looking at).

Anyway, I really look forward to when you add the option for multiple wordlists. It would be great to have a "known" word list and a bunch of others as well to play with. What about an "unknown" word list where you could let us use the dictionary ;)?

February 19, 2016 at 07:23 AM

Doing what you describe (finding which words you need in order to reach x% understanding of a piece of text, comparing wordlists against multiple documents) are also a planned features so you won't have to do this manually.

there were a bunch of words that I couldn't add to this hypothetical word list unless I exited the program

You can achieve the same effect by exporting the words, and checking 'Mark exported words as known'. If you use Ctrl-Shift-C to export, it will export to the clipboard and then you can just ignore it.

February 19, 2016 at 07:29 AM

Still, I'd rather Imron was happy with his product (and enthusiastic about building on it) than otherwise. Plus I use it almost every day, a painless way of selecting words to learn from texts I'm reading. Well, almost painless! But I love the programme and they do say you shouldn't try to change the one you love....

Sign In

Introducing Chinese Text Analyser

Recommended Posts

imron

imron

murrayjames

imron

murrayjames

DanielW

imron

DanielW

imron

imron

Guest realmayo

imron

laowhiner

imron

Guest realmayo

imron

imron

laowhiner

imron

Guest realmayo

Join the conversation