imron Posted July 30, 2020 at 12:25 PM Author Report Share Posted July 30, 2020 at 12:25 PM I'll add "Reader Mode" to my todo list. Quote Link to comment Share on other sites More sharing options...
icebear Posted July 30, 2020 at 04:38 PM Report Share Posted July 30, 2020 at 04:38 PM Look forward to all of these when available. Great app, highly recommended to others! Quote Link to comment Share on other sites More sharing options...
icebear Posted August 8, 2020 at 01:43 AM Report Share Posted August 8, 2020 at 01:43 AM Request: allow me to configure highlight colors, especially in dark mode. Blue text on a black background is hard on the eyes! Quote Link to comment Share on other sites More sharing options...
imron Posted August 8, 2020 at 01:56 AM Author Report Share Posted August 8, 2020 at 01:56 AM You can do this, but it involves manually editing a config file. If you're on windows, this will be located in: c:\users\<username>\AppLocal\Data\ChineseTextAnalyser\colour-schemes\default.colours (there is a similar file on other OSes so let me know if you're on a different OS). You can edit that file with any text editor, and the values are hex-colours with the # removed. 1 Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted August 9, 2020 at 12:11 AM Report Share Posted August 9, 2020 at 12:11 AM I'm trying to do an analysis of the 普通话水平测试 to determine how to go about learning all the 字. I want to see if I use my time learning all the 字 in the 60篇文章 will I then know most of the 字 that appear on the test. However, CTA uses 词 on the side on only says how many unique characters there are without allowing me to do analysis based on the characters. Is there any way to work this out? 《普通话水平测试用普通话词语表》.doc 普通话水平测试文章60篇.doc 1 1 Quote Link to comment Share on other sites More sharing options...
大块头 Posted August 9, 2020 at 01:20 AM Report Share Posted August 9, 2020 at 01:20 AM Is CTA meant to analyze things on the character level like that? In any case, counting the characters in those essays only took a few lines of Python, if that's helpful to you. See the attached csv file. char_count.csv 1 1 Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted August 9, 2020 at 01:53 AM Report Share Posted August 9, 2020 at 01:53 AM 30 minutes ago, 大块头 said: Is CTA meant to analyze things on the character level like that? In any case, counting the characters in those essays only took a few lines of Python, if that's helpful to you. See the attached csv file. Thank you. Coding is definitely a language I wish I was more interested in. So useful. But Yes, that's kind of what I'm looking for but not just the raw frequency of the characters. I'm looking to determine what % of characters would be covered if I learned all of the characters that show up in the 60 essays and vice versa (what percentage of the characters in the essays would be covered if I learned the list of words). Would it be doable in python to check this? Quote Link to comment Share on other sites More sharing options...
大块头 Posted August 9, 2020 at 02:24 AM Report Share Posted August 9, 2020 at 02:24 AM The essays contain 2293 unique characters. The word list contains 1668 unique characters. The intersection of these two sets contains 1307 characters. I won't share my code just in case there is some way to make CTA do this. My intention isn't to cobble together some 山寨 version of one of its functions... 2 Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted August 9, 2020 at 02:33 AM Report Share Posted August 9, 2020 at 02:33 AM That's great info. Then I'm going to work on learning the 60 essays since that is more fun than the list and then learn the remaining 300+ characters after that. Might take a couple years, though. Add this to my list of function requests for CTA @imron Quote Link to comment Share on other sites More sharing options...
大块头 Posted August 9, 2020 at 02:51 AM Report Share Posted August 9, 2020 at 02:51 AM 17 minutes ago, 艾墨本 said: I'm going to work on learning the 60 essays Sounds like a great use case for CTA! 1 Quote Link to comment Share on other sites More sharing options...
LinZhenPu Posted August 9, 2020 at 03:33 AM Report Share Posted August 9, 2020 at 03:33 AM @艾墨本 Are you going to one day take the Putonghua test that mainland Chinese people take? ? Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted August 9, 2020 at 06:37 AM Report Share Posted August 9, 2020 at 06:37 AM 3 hours ago, LinZhenPu said: Are you going to one day take the Putonghua test that mainland Chinese people take? ? That's my goal. I started working through it last year and got side tracked with COVID. Four of the essays down, 56 to go. But I'm also focusing on quality over quantity (though quantity will be needed eventually) making sure I can properly recite each line in a "story telling" fashion. Even after learning just four of them with my tutor (Shout out to @GoEastMandarin) I saw an enormous amount of growth. CTA helps me determine which words to focus on. Quote Link to comment Share on other sites More sharing options...
imron Posted August 10, 2020 at 12:46 AM Author Report Share Posted August 10, 2020 at 12:46 AM 22 hours ago, 大块头 said: I won't share my code just in case there is some way to make CTA do this. My intention isn't to cobble together some 山寨 version of one of its functions... Thanks for the consideration, but I generally follow a philosophy that more is better than less, so regardless of whether or not CTA can do this, please feel free to share source code or tools that other people might find useful (but maybe start a new thread, to keep this one just about CTA). That being said, CTA intentionally focuses on 词 rather than 字 and doesn't have this feature built in. I've considered adding it, but am still in two minds about it. However, what CTA does have is Lua scripting support, and in that sense you can make CTA do whatever you want. For example, here is a script that counts the number of unknown characters in a document. It would be trivial to modify that script to count all characters, just change line 47 from this if charType == "Chinese" and knownChars[char] == nil then to this if charType == "Chinese" then And with a bit of effort, it could also be made to calculate the % coverage of a document with a given word list - in fact there is already a script that ships with CTA (char-coverage.lua) that does this for HSK6 coverage of a given document. @大块头, if you don't want to tread on CTA's toes, feel free to make Lua script versions of any scripts and post them in that other thread Quote Link to comment Share on other sites More sharing options...
大块头 Posted August 10, 2020 at 01:57 AM Report Share Posted August 10, 2020 at 01:57 AM 1 hour ago, imron said: @大块头, if you don't want to tread on CTA's toes, feel free to make Lua script versions of any scripts and post them in that other thread 生活苦短,我用Python。 Quote Link to comment Share on other sites More sharing options...
philwhite Posted August 10, 2020 at 01:50 PM Report Share Posted August 10, 2020 at 01:50 PM On 8/10/2020 at 2:57 AM, 大块头 said: 生活苦短,我用Python。 生活苦短,我用bash, echo 'Unknown char count:' comm -13 sortedknowncharlist <(cat file | sed 's/\(.\)/\1\n/g' | sort | uniq) | wc -l 2 Quote Link to comment Share on other sites More sharing options...
大块头 Posted August 10, 2020 at 04:12 PM Report Share Posted August 10, 2020 at 04:12 PM For every 100 of us snot-nosed brats scribbling on whiteboards and typing at our fancy-schmancy IDEs, there is some UNIX wizard who is sipping coffee and browsing Usenet because they've already solved the problem with a bash one-liner. 2 Quote Link to comment Share on other sites More sharing options...
icebear Posted August 12, 2020 at 04:04 PM Report Share Posted August 12, 2020 at 04:04 PM On 8/7/2020 at 5:56 PM, imron said: You can do this, but it involves manually editing a config file. Thanks, worked like a charm! 1 Quote Link to comment Share on other sites More sharing options...
philwhite Posted August 13, 2020 at 07:04 AM Report Share Posted August 13, 2020 at 07:04 AM On 8/10/2020 at 2:50 PM, philwhite said: comm -13 sortedknowncharlist <(cat file | sed 's/\(.\)/\1\n/g' | sort | uniq) | wc -l TMTOWTDI: comm -13 sortedknowncharlist <(sed 's/./&\n/g' file | sort -u) | wc -l 2 Quote Link to comment Share on other sites More sharing options...
imron Posted August 13, 2020 at 10:27 PM Author Report Share Posted August 13, 2020 at 10:27 PM On 8/12/2020 at 4:04 PM, icebear said: Thanks, worked like a charm! What colours did you end up using? 15 hours ago, philwhite said: comm -13 sortedknowncharlist <(sed 's/./&\n/g' file | sort -u) | wc -l Stray cats are a continual problem with unix one-liners ? 1 Quote Link to comment Share on other sites More sharing options...
mungouk Posted August 13, 2020 at 10:39 PM Report Share Posted August 13, 2020 at 10:39 PM I bought CTA a while back, and to be honest have only used it once or twice, to analyse HSK levels of ebooks or similar. I think what I'm missing is some good descriptions of use-cases and tutorials to show what it's capable of, and how I could be using it. Are there any examples out there already on, say, youtube? If not, do any of you power-users feel like explaining how you use it to do things you couldn't do with other tools? I guess I'm not the only one who could benefit from your collective wisdom. Cheers! 1 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.