Introducing Chinese Text Analyser

February 24, 2020 at 05:09 PM

8 hours ago, imron said:

CTA is opinionated software

I wonder where it got that feature from? :mrgreen:

February 24, 2020 at 08:21 PM

Its opinionated author :mrgreen:

February 25, 2020 at 03:36 AM

It would be fine to add also a function that can mark characters from HSK1, 2, 3, 4, 5, 6, as it's realized with words. And also statistics is important, in particular for unique characters from these sets. It's very helpful that CTA aready shows the percent of the unique characters, though for all of them only. In other words, it would be very desirable to have everything as for words. I could invest US$ 1,000 in that for the benifit of all, without any special rights for myself.

February 25, 2020 at 03:43 AM

@Jan Finster

I fully agree with you that characters are important as they are, too, not only as parts of words. I have an idea that learning characters in some sufficient number in advance, say, the HSK5 set, would let to advance more rapidly later when one will start to read HSK5 texts and other materials. Please, look through my four element system to learn characters at https://www.chinese-forums.com/forums/topic/59580-four-element-system-to-learn-characters-their-pinyin-and-meaning/, it would be very interesting to know your opinion.

March 14, 2020 at 06:02 AM

Imron, how do you page up/down in CTA on MacOS? Pressing Fn + up/down arrow keys does not work. It seems you can only scroll text a few lines at a time.

March 14, 2020 at 10:01 AM

There does not appear to be a way to do it, which is an oversight because the feature is there on other platforms so it seems I forgot to hook up the keypress on mac. It'll go in to the next release.

March 16, 2020 at 07:17 AM

Imron, how do I add new words on the Mac OS version of CTA? I cannot get the pop-up menu to appear using the touchpad, even when holding down the control key. (I can with a right-clickable mouse though.)

March 16, 2020 at 07:23 AM

Oh I see. I need to click the touchpad with two fingers simultaneously (or set up a different option in Mac OS System Preferences).

Shouldn’t control + click still work, though?

March 16, 2020 at 10:13 AM

It should still work, but it doesn't (I'll add that to the list of things todo).

March 16, 2020 at 10:23 AM

How long is that list? ?

March 16, 2020 at 10:41 AM

The full list is quite long. The list for the next release is much shorter, and involves quick easy things like this.

March 25, 2020 at 07:24 AM

@imron

It would be especially great if CTA could also mark and counter 'head' characters even though they're of conditional nature. They can be considered as 'names' of syllables like in old English and other old languages letters had names, eg. Thorn (þ), Wynn (ƿ), Ash (æ), Yogh (ȝ). And it's an effective way to organize a set of characters in one's mind after all, why not to use it?
The concept of 'head characters' I described here, particularly in the last self-reply
https://www.chinese-forums.com/forums/topic/59743-meaningphonetics-based-system-to-learn-characters-adopted-for-english-speakers/

March 27, 2020 at 04:59 PM

Forgive me if this has been discussed, but I'm new here and there are 30 pages.

Been playing around with CTA and I'm finding it useful to scroll through unknown word lists and pick out single-character words, as very often they're part of a name or, almost never, a segmentation error, or a new or rare word. Two things that would make this easier...

1) A sort by length view in the word window.

2) When looking for those characters, a way to only find them if they're NOT part of an existing word. So if I can see there are 10 instances of 诊 segmented as a single character in the word view, I don't need to click through all the correctly segmented 诊所 to find the 诊间。That is, a find function which ignores matches which are already part of a larger word, I suppose.

Do these things exist? Should they?

March 27, 2020 at 10:35 PM

They don't exist, but it shouldn't be difficult to add them.

March 28, 2020 at 07:55 AM

On 2/24/2020 at 9:51 AM, Jan Finster said:

On 2/24/2020 at 9:49 AM, imron said:

it would be possible to write a Lua script that built a list of known characters from CTAs list of known words, and then use that to build a list of characters in a document that were not on that list. If that's something you would really like, I can probably write a quick script to do it.

I would love it if you or anyone else capable of writing such a script could do this ?

Bump ?

March 28, 2020 at 11:21 AM

?

March 29, 2020 at 05:12 AM

21 hours ago, Jan Finster said:

Bump ?

Here you go.

Download that file somewhere to your computer, and then open it from within the Run Script dialog (Tools->Run Script). Feel free to ask follow up questions in the linked thread.

March 31, 2020 at 03:36 PM

New release is up, it includes:

Updated CC-CEDICT to use the latest version
Added word length column to the wordlist view
Fixed bug where words that spanned a line weren't highlighting correctly
macOS: Added support for standard PageUp and PageDown keyboard shortcuts with Fn Up and Fn Down
macOS: Added support for system-wide dark mode
macOS: Ctrl-left click will now display the popup menu in the textview
macOS: Fixed bug where dictionary definitions weren't showing in dark mode

@roddy you can now sort by word length using the word length column (hidden all the way on the right so you may need to scroll). Searching for individual characters not in words is going to be a bit trickier than I expected so didn't make it in to this release.

March 31, 2020 at 03:50 PM

On 2/25/2020 at 3:36 AM, Pall said:

It would be fine to add also a function that can mark characters from HSK1, 2, 3, 4, 5, 6

I get that people are interested in things like this, but CTA aims to subtly push people away from thinking in terms of HSK. In fact I only provide the HSK statistics to drive home the point that for most native content, the vocabulary for the HSK doesn't give you very much at all, and you're better off using frequently occurring words in what you are reading.

On 3/25/2020 at 7:24 AM, Pall said:

It would be especially great if CTA could also mark and counter 'head' characters even though they're of conditional nature.

I had a look through your link, but I'm still not entirely sure what you mean by head characters.

March 31, 2020 at 03:53 PM

All together now:

Did you ever know that you're my hero
And everything I would like to be?
I can fly higher than an eagle
For you are Imron beneath my wings

Sign In

Introducing Chinese Text Analyser

Recommended Posts

Guest realmayo

imron

Pall

Pall

murrayjames

imron

murrayjames

murrayjames

imron

murrayjames

imron

Pall

roddy

imron

Jan Finster

LinZhenPu

imron

imron

imron

roddy

Join the conversation