Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


Recommended Posts

Posted

Another suggestion...

 

Yesterday when I was going through a subtitle file with CTA, I realized that it could be a lot less time consuming if there was an option to only show unknown words as red the first time they appear in a text. It would be really cool if there was an option to be able to color unknown words black (or some other color) after their first appearance. This would save so much time going through a text looking for words that I know, but that I haven't yet marked as known, because I wouldn't have to keep reading the same unknown words again.

Posted

Can you try to explain a little bit more about the use case and what you are trying to do and what you are doing to achieve that.

 

I ask because 'highlighting only the first time an unknown word' seems like a feature very specific to your current workflow and it might actually be better to add a feature that instead simplifies the entire workflow.

Posted

imron, I wanted to explain this in a different way but the words aren't coming together in my head, so I'll just say this.

The use case that Yadang has outlined doesn't seem overly specific. I think that many people myself included just beginning to use the app would want to have this. It will make life a lot easier.

Posted

Ok, but for example, on the word list table, you can click the 'Unknown' tab to see all the unknown words, which can be sorted in various forms by clicking the header of each column, and you can double click on each word to take you to the first (and then next) instances of that word.

 

That strikes me as being a faster and more systematic way to approach the problem of finding the first instance of an unknown word in a document.

 

I'm not saying I won't add this as a feature, just that I want to understand what people are trying to do, to make sure that I add the right feature.

  • Like 1
Posted

Sorry, for some reason I'm not getting emailed anymore (even though my settings are still set to email me)...

 

Anyways, I'll just walk though my process to elucidate it better. 

 

Whether a book or a subtitle file, I import it into CTA. Then, because I haven't used CTA enough for it to know all of the words I actually know,  I always go though the whole text (or if a book, just a chapter) and select all the words I know that CTA doesn't know I know. Even though it's pretty quick to find unknown words due to color coding, I still spend a good amount of time going though the text, searching for words I know. I realized that I would spend a lot less time going though the text if I didn't keep seeing unknown words over and over again (also, it's the unknown words that take a while to know whether to mark or not mark as known in the first place -- because I'm using it mainly for subtitles and not for reading, and because I'm using simplified subtitles when I'm more comfortable with traditional characters, sometimes it takes a little while to process and realize that I actually don't know the word, not that I just don't recognize it as quickly in simplified as I would in traditional, etc.). I just thought it would be nice to not spend so much time considering whether words are unknown or not when just a few minutes earlier I had spent time on the same exact word.

 

Having said that, somehow I didn't realize that I could do what you describe with the unknown tab. This might be a really great solution. I'll try it and get back to you. Thanks!

Posted

Having said that, somehow I didn't realize that I could do what you describe

It's mentioned in the documentation that has not yet been released :-) but which will be finished for the next release which will hopefully be very soon now.

Posted

That's very cool, Imron! That should save me a lot of time - thanks!

 

The only thing is, upon double clicking, it seems to do a search, highlighting all the occurrences in the document. I like this, but the only problem is one has to right-click to mark words as known, taking more time than just double clicking...

 

So what about:

 

1) either changing the way in which all occurrences in the document are highlighted and then making it such that double clicking the word in the list will mark it as known

 

or

 

2) making some kind of [enter shortcut here] + click shortcut. Something like crtl + click or something. So I can just go though the list, holding control and clicking on the words I know. That would be really nice.

  • 2 weeks later...
Posted

Any thoughts about making the list of stats (right pane) exportable? Or at least cursor-select-copy-pastable?

Posted

Sounds reasonable. I'll look in to it. I'm close to releasing a new version and if it's quick enough to add, this should make it in.

Posted

P.S. if anyone else has other little things they'd like to see added/fixed now would be a good time to mention it.  Things currently going in to the next release:

 

* Lua scripting

* Documentation

* Fix for macOS Sierra statistics view

* ctrl-doubleclick/command-doubleclick in wordlist view to toggle known/unknown

* Online lookup of words

* Ability to set size of Chinese text in the bottom left wordlist view.

* Copying statistics to clipboard

* Native Linux support (this one is still only a maybe at the moment.  A lot of work has been done, but it's still missing a large number of features, so it'll depend on if I can get them finished in a timely manner or not.  Anyone interested in helping test this?)

 

If you've made a suggestion previously and don't see it on the list it's possibly because implementing it will take too much time so I've slated it away for a future release. Feel free to mention it again though if you'd really, really like it in the next release as the squeaky wheel gets the grease (the corpus feature isn't going to make it this time though).

  • Like 3
Posted

hmm how about ability to mark all unknown words in a text with something that isn't formatting? i.e. user can choose to add an asterisk or other mark?

Don't know if this would be interesting to anyone else though. But would mean I could paste the contents of the CTA window into a text file for export to a portable device to read, while retaining the 'unknown words' marking.

Posted

I'd really really like to see custom segmentation... But I suspect that's one of the things that will take too much time?

 

Anyway, I'm happy with Lua :) 

 

Also, will I be able to transfer over my word lists from my (portable) windows version to Linux?

Posted

Yep, custom segmentation is off the list for the next release.

 

#414 is maybe possible.  I'll have a think about it.

 

The portable version should switch over just fine.  Did you mean to run it under Linux as portable also?

Posted

 

The portable version should switch over just fine.  Did you mean to run it under Linux as portable also?

 

Excellent! No, I just meant import the word lists from my portable Windows version to the regular Linux version.

 

I'd be interested in helping test the linux version, depending on how much time it takes.

Posted

This is probably me just being daft but:

 

I pasted 夜猫 into CTA.

 

CTA doesn't recognise it as a word and doesn't recognise it as two characters.

 

Weird?

 

See attachment.

 

post-4446-0-07758500-1480240380_thumb.jpg

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...