Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


imron

Recommended Posts

@imron

It would be fine to add also a function that can mark characters from HSK1, 2, 3, 4, 5, 6, as it's realized with words. And also statistics is important, in particular for unique characters from these sets. It's very helpful that CTA aready shows the percent of the unique characters, though for all of them only. In other words, it would be very desirable to have everything as for words. I could invest US$ 1,000 in that for the benifit of all, without any special rights for myself. 

  • Like 1
Link to comment
Share on other sites

@Jan Finster

I fully agree with you that characters are important as they are, too, not only as parts of words. I have an idea that learning characters in some sufficient number in advance, say, the HSK5 set, would let to advance more rapidly later when one will start to read HSK5 texts and other materials. Please, look through my four element system to learn characters at https://www.chinese-forums.com/forums/topic/59580-four-element-system-to-learn-characters-their-pinyin-and-meaning/, it would be very interesting to know your opinion. 

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

Imron, how do I add new words on the Mac OS version of CTA? I cannot get the pop-up menu to appear using the touchpad, even when holding down the control key.  (I can with a right-clickable mouse though.) 

Link to comment
Share on other sites

  • 2 weeks later...

@imron

It would be especially great if CTA could also mark and counter 'head' characters even though they're of conditional nature. They can be considered as 'names' of syllables like in old English and other old languages letters had names, eg. Thorn (þ), Wynn (ƿ), Ash (æ), Yogh (ȝ). And it's an effective way to organize a set of characters in one's mind after all, why not to use it?
The concept of 'head characters' I described here, particularly in the last self-reply
https://www.chinese-forums.com/forums/topic/59743-meaningphonetics-based-system-to-learn-characters-adopted-for-english-speakers/

Link to comment
Share on other sites

Forgive me if this has been discussed, but I'm new here and there are 30 pages.

 

Been playing around with CTA and I'm finding it useful to scroll through unknown word lists and pick out single-character words, as very often they're part of a name or, almost never, a segmentation error, or a new or rare word. Two things that would make this easier...

 

1) A sort by length view in the word window.

2) When looking for those characters, a way to only find them if they're NOT part of an existing word. So if I can see there are 10 instances of 诊 segmented as a single character in the word view, I don't need to click through all the correctly segmented 诊所 to find the 诊间。That is, a find function which ignores matches which are already part of a larger word, I suppose.

 

Do these things exist? Should they? 

Link to comment
Share on other sites

On 2/24/2020 at 9:51 AM, Jan Finster said:
On 2/24/2020 at 9:49 AM, imron said:

it would be possible to write a Lua script that built a list of known characters from CTAs list of known words, and then use that to build a list of characters in a document that were not on that list.  If that's something you would really like, I can probably write a quick script to do it.

 

I would love it if you or anyone else capable of writing such a script could do this ?

Bump ?

 

  • Like 1
Link to comment
Share on other sites

New release is up, it includes:

  • Updated CC-CEDICT to use the latest version
  • Added word length column to the wordlist view
  • Fixed bug where words that spanned a line weren't highlighting correctly
  • macOS: Added support for standard PageUp and PageDown keyboard shortcuts with Fn Up and Fn Down
  • macOS: Added support for system-wide dark mode
  • macOS: Ctrl-left click will now display the popup menu in the textview
  • macOS: Fixed bug where dictionary definitions weren't showing in dark mode

@roddy you can now sort by word length using the word length column (hidden all the way on the right so you may need to scroll).  Searching for individual characters not in words is going to be a bit trickier than I expected so didn't make it in to this release.

  • Like 3
Link to comment
Share on other sites

On 2/25/2020 at 3:36 AM, Pall said:

It would be fine to add also a function that can mark characters from HSK1, 2, 3, 4, 5, 6

I get that people are interested in things like this, but CTA aims to subtly push people away from thinking in terms of HSK.  In fact I only provide the HSK statistics to drive home the point that for most native content, the vocabulary for the HSK doesn't give you very much at all, and you're better off using frequently occurring words in what you are reading.

 

On 3/25/2020 at 7:24 AM, Pall said:

It would be especially great if CTA could also mark and counter 'head' characters even though they're of conditional nature.

I had a look through your link, but I'm still not entirely sure what you mean by head characters.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...