Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


Recommended Posts

Posted

imron,

 

Downloaded the trial and loving it so far. I think it will replace Wenlin as my main reader. I didn't realize how much I was using dictionary popups as a crutch -- looking up words I was already familiar with, and affecting reading speed in the process.

 

Will future updates preserve my "known words" and custom vocabulary? After just one hour, I have a vocabulary size of 600 and have added many words. That's some 1500 keypresses; I'd hate to lose that progress.

 

EDIT: Might you consider an option for white text on a dark background? I get eye strain if I read black-on-white on a computer screen for long periods of time.

Posted

Colors:

Like this?

(食 is marked as known, 炒飯 is highlighted and all other words are unknown. The rest of the program appears to be unchanged.)

post-12291-0-44065000-1411163666_thumb.jpg

 

If no one objects I would post how to do it.

 

  • Like 1
Posted

I didn't realize how much I was using dictionary popups as a crutch

I don't think many people do, and this is something I wanted to make explicitly obvious with CTA.

Anyway all known and custom words are preserved over updates and will eventually allow synching between multiple computers.

Bonus points to querido for playing with the config file. Things like this are not secrets so in the future feel free to share. I'll chime in directly if it's likely to be something that will change across versions or if there is something to be careful with.

As querido has shown, changing colours is supported it just doesn't have a UI yet and has to be done manually from the config file:

C:\users\<username>\appdata\local\chinesetextanalyser\data\config is the file to edit.

Should be pretty obvious how to change the colours. Each value is specified in RRGGBB hex format. Just make sure not to have #at the start of the hex colour because anything on the line after # is treated as a comment and ignored.

Posted

For your convenience here is my "night mode" to replace the [appearance] section of the file that imron named, above. I saved a copy of config as config.day and named this one config.night. A copy of one, named just "config" is the one that the program will use. Edit: Better change that font name if you don't have or use STKaiti.

 

[appearance]
    .version = 1
    currentSearch.background = 000000
    currentSearch.foreground = 38d878
    font.name = STKaiti
    font.size = 20
    hover.background = 0000ff
    hover.foreground = efefef
    known.background = 000000
    known.foreground = ffffff
    normal.background = 000000
    normal.foreground = ffffff
    searchTerm.background = ffffff
    searchTerm.foreground = d65eff
    selection.background = ffffff
    selection.foreground = 000000
    unknown.background = 000000
    unknown.foreground = ee0000
  • Like 1
Posted

Also note that with the config file, it gets saved out when you close the program, so don't edit it while CTA is running otherwise your changes will be overwritten when exiting.

Posted

It might be ready to use a Cantonese dictionary in cedict format with just one inelegant hitch: it needs an option to show "pinyin (numbers)" in the dictionary popup, just as you offer in the Export Wordlist dialogue.

 

For example, this (edited into cedict_ts.u8): "炒飯 炒饭 [caau2 faan6] /fried rice/" displays "cáau faan6" in the definition popup.

 

But if I export the wordlist and ask for "pinyin (numbers)" the output is the desired "caau2 faan6".  :-)

Posted

Will add to my todo list. Custom dictionary entries is also on my list so you won't need to patch things in in the future and changes will be preserved over upgrades.

  • Like 1
Posted

Thanks imron and querido!

 

I tweaked the config, here's a picture of my night mode. It's not white-on-black exactly, more light grey-on-dark grey. Slight difference, but helps when you're reading for long periods. Also, since the HTML defaults for blue and red look muted against a dark background, I increased their color brightness (for unknown words).

 

I have a question about word segmentation. Let's say there's this sentence: 不说好坏,只说事实。The CTA dictionary has entries for both 说好 and 好坏. In this case, I want to select 好坏 as a known word, but I can't. Mousing over 说 or 好 groups the two together, and mousing over 坏 selects the single character only. Are there plans to change this?

 

A couple ideas for future features:

 

1) A reminder/warning about unsaved tabs when closing the program. (Currently I only get a warning when closing a single tab manually.)

 

2) The ability to drag tabs left or right.

 

3) The ability to close [X] the Word Statistics sidebar, or Hide menus/toolbar. Maybe a Fullscreen Mode, or a View menu where the user can choose what they want to see? 

post-12197-0-28627800-1411222886_thumb.gif

Posted

It appears that what it gives me when I ask for "pinyin (numbers)" is whatever is between the brackets in the cedict formatted dictionary. This is great because it won't care what romanization is in there and isn't tied to one variety* of Chinese. 

Posted

Really like the grey on black, much better nite view for lots of things I use for reading English and Chinese. Never thought to try that combination. Good one murrayjames

 

I usually have a straw coloured paper with black ink for daytime and have never used built in nite modes because I found the the contrast too stark and the white would really begin to cause problems ( vibrating and jumping about),

Posted
is there a quick way to say that you've learned a word without using right click

Double click the word.

 

I have a question about word segmentation.

At the moment segmentation is very basic, it just does first longest matching word.  I have a load of plans to improve this, including allowing the user to explicitly mark segmentation, however improvements to the segmenter are currently a low priority while I work on getting the rest of the program finished.  The main reason being that segmenter output is mostly acceptable but I could spend hours tweaking the segmenter for only minor improvements in accuracy, which is not the best use of time when the rest of the program is still missing a number of key features.  It will be improved eventually however, with name recognition, improved sentence parsing and custom segmentation.

 

1) A reminder/warning about unsaved tabs when closing the program.

Added to my todo list.

 

2) The ability to drag tabs left or right.

Improving the tab bar is on my list already, but currently has a low priority.

 

3) The ability to close [X] the Word Statistics sidebar,

There is a slider between the main window and the sidebar you can drag to make it disappear.  There is also a slider between the two statistics windows you can adjust for the same effect.

 

Maybe a Fullscreen Mode,

Added to my list

 

Downloaded the trial and loving it so far

P.S. If you're quick, there's still one of these left.

Posted
You'll need a cedict formatted Cantonese dictionary .... Any opinions?

Probably best to ask this question in it's own thread rather than tucked away at the end of a post in a thread that maybe not everyone on the forums is reading.

Posted

Shelley, glad you like it. The colors are d8d8d8 (light grey) on 1c1c1c (dark grey). You can also tweak your Windows display settings to use these colors for Word, Powerpoint, Internet Explorer, etc.

 

imron, feedback sent!

Posted

hi Imron,

 

Quick CTA question! Is there an easy way to cancel highlighting of words after a search? This is what I do currently: I search for 你, and click Find. The Find dialog window closes and every 你 in the document is highlighted. If I want the document to return to normal, I need to re-open the Find dialog and click Cancel (or Esc).

 

I think an easier way would be:

 

1) The Find dialog stays open until you close it, at which point the highlighting turns off  (as in Word, Notepad, Chrome, etc.)

 

OR

 

2) Left-clicking anywhere in the document unhighlights the selected words  (as in Wenlin).

Posted

Will add something similar to option 2 because I like the words being highlighted without the find box in the way :)

 

To make them go away, I typically use ctrl-f, esc, which I can do quite fast.

  • 2 weeks later...
Posted

A new version is now up, with support for light/dark colour schemes, cancelling of highlighted text with left click, tone numbers in the definition view (need to edit the config file), fullscreen mode (but still with toolbar and menu for now), and a warning if closing the program and there are unsaved clipboard documents open.

 

It also has a significant refactoring to separate all the program logic from the GUI (in preparation for OSX and Linux versions), which will reduce the amount of effort required for maintaining multi-platform versions of the program.

 

For those interested, the core of the OSX stuff is also done and now I just need to hook up the rest of the GUI.  Unfortunately, for those waiting for the corpus functionality, it will be coming after the OSX version and not before.  On the plus side however, with the refactoring mentioned above, it will be little/no extra work to have it in both versions.

  • Like 2

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...