Introducing Chinese Text Analyser

November 6, 2016 at 03:09 AM

Another suggestion...

Yesterday when I was going through a subtitle file with CTA, I realized that it could be a lot less time consuming if there was an option to only show unknown words as red the first time they appear in a text. It would be really cool if there was an option to be able to color unknown words black (or some other color) after their first appearance. This would save so much time going through a text looking for words that I know, but that I haven't yet marked as known, because I wouldn't have to keep reading the same unknown words again.

November 6, 2016 at 08:57 AM

Can you try to explain a little bit more about the use case and what you are trying to do and what you are doing to achieve that.

I ask because 'highlighting only the first time an unknown word' seems like a feature very specific to your current workflow and it might actually be better to add a feature that instead simplifies the entire workflow.

November 10, 2016 at 02:47 PM

imron, I wanted to explain this in a different way but the words aren't coming together in my head, so I'll just say this.

The use case that Yadang has outlined doesn't seem overly specific. I think that many people myself included just beginning to use the app would want to have this. It will make life a lot easier.

November 11, 2016 at 01:16 AM

Ok, but for example, on the word list table, you can click the 'Unknown' tab to see all the unknown words, which can be sorted in various forms by clicking the header of each column, and you can double click on each word to take you to the first (and then next) instances of that word.

That strikes me as being a faster and more systematic way to approach the problem of finding the first instance of an unknown word in a document.

I'm not saying I won't add this as a feature, just that I want to understand what people are trying to do, to make sure that I add the right feature.

November 11, 2016 at 04:02 AM

Understood. Let's hope Yadang comes back with more details about his use case.

November 12, 2016 at 06:16 AM

Sorry, for some reason I'm not getting emailed anymore (even though my settings are still set to email me)...

Anyways, I'll just walk though my process to elucidate it better.

Whether a book or a subtitle file, I import it into CTA. Then, because I haven't used CTA enough for it to know all of the words I actually know, I always go though the whole text (or if a book, just a chapter) and select all the words I know that CTA doesn't know I know. Even though it's pretty quick to find unknown words due to color coding, I still spend a good amount of time going though the text, searching for words I know. I realized that I would spend a lot less time going though the text if I didn't keep seeing unknown words over and over again (also, it's the unknown words that take a while to know whether to mark or not mark as known in the first place -- because I'm using it mainly for subtitles and not for reading, and because I'm using simplified subtitles when I'm more comfortable with traditional characters, sometimes it takes a little while to process and realize that I actually don't know the word, not that I just don't recognize it as quickly in simplified as I would in traditional, etc.). I just thought it would be nice to not spend so much time considering whether words are unknown or not when just a few minutes earlier I had spent time on the same exact word.

Having said that, somehow I didn't realize that I could do what you describe with the unknown tab. This might be a really great solution. I'll try it and get back to you. Thanks!

November 12, 2016 at 08:25 AM

Having said that, somehow I didn't realize that I could do what you describe

It's mentioned in the documentation that has not yet been released :-) but which will be finished for the next release which will hopefully be very soon now.

November 13, 2016 at 07:28 AM

That's very cool, Imron! That should save me a lot of time - thanks!

The only thing is, upon double clicking, it seems to do a search, highlighting all the occurrences in the document. I like this, but the only problem is one has to right-click to mark words as known, taking more time than just double clicking...

So what about:

1) either changing the way in which all occurrences in the document are highlighted and then making it such that double clicking the word in the list will mark it as known

or

2) making some kind of [enter shortcut here] + click shortcut. Something like crtl + click or something. So I can just go though the list, holding control and clicking on the words I know. That would be really nice.

November 13, 2016 at 07:40 AM

Ctrl-click (or something else) seems quite reasonable and easy to add.

November 24, 2016 at 10:26 AM

Any thoughts about making the list of stats (right pane) exportable? Or at least cursor-select-copy-pastable?

November 25, 2016 at 12:00 AM

Sounds reasonable. I'll look in to it. I'm close to releasing a new version and if it's quick enough to add, this should make it in.

November 25, 2016 at 08:04 AM

P.S. if anyone else has other little things they'd like to see added/fixed now would be a good time to mention it. Things currently going in to the next release:

* Lua scripting

* Documentation

* Fix for macOS Sierra statistics view

* ctrl-doubleclick/command-doubleclick in wordlist view to toggle known/unknown

* Online lookup of words

* Ability to set size of Chinese text in the bottom left wordlist view.

* Copying statistics to clipboard

* Native Linux support (this one is still only a maybe at the moment. A lot of work has been done, but it's still missing a large number of features, so it'll depend on if I can get them finished in a timely manner or not. Anyone interested in helping test this?)

If you've made a suggestion previously and don't see it on the list it's possibly because implementing it will take too much time so I've slated it away for a future release. Feel free to mention it again though if you'd really, really like it in the next release as the squeaky wheel gets the grease (the corpus feature isn't going to make it this time though).

November 25, 2016 at 08:17 AM

thanks Imron!

November 25, 2016 at 08:36 AM

hmm how about ability to mark all unknown words in a text with something that isn't formatting? i.e. user can choose to add an asterisk or other mark?

Don't know if this would be interesting to anyone else though. But would mean I could paste the contents of the CTA window into a text file for export to a portable device to read, while retaining the 'unknown words' marking.

November 25, 2016 at 09:03 AM

I'd really really like to see custom segmentation... But I suspect that's one of the things that will take too much time?

Anyway, I'm happy with Lua

Also, will I be able to transfer over my word lists from my (portable) windows version to Linux?

November 25, 2016 at 09:41 AM

Yep, custom segmentation is off the list for the next release.

#414 is maybe possible. I'll have a think about it.

The portable version should switch over just fine. Did you mean to run it under Linux as portable also?

November 25, 2016 at 08:25 PM

The portable version should switch over just fine. Did you mean to run it under Linux as portable also?

Excellent! No, I just meant import the word lists from my portable Windows version to the regular Linux version.

I'd be interested in helping test the linux version, depending on how much time it takes.

November 26, 2016 at 08:01 AM

I'll be another one happy to test the Linux version.

November 26, 2016 at 11:17 AM

I would like to test the Linux version as well!

November 27, 2016 at 09:53 AM

This is probably me just being daft but:

I pasted 夜猫 into CTA.

CTA doesn't recognise it as a word and doesn't recognise it as two characters.

Weird?

See attachment.

Sign In

Introducing Chinese Text Analyser

Recommended Posts

Yadang

imron

LinZhenPu

imron

LinZhenPu

Yadang

imron

Yadang

imron

Guest realmayo

imron

imron

Guest realmayo

Guest realmayo

Yadang

imron

Yadang

LinZhenPu

Naphta

Guest realmayo

Join the conversation