imron Posted February 7, 2019 at 06:37 AM Author Report Posted February 7, 2019 at 06:37 AM I can barely manage to develop CTA. Not sure where I can find the time at the moment to make ETA. Quote
murrayjames Posted February 7, 2019 at 06:39 AM Report Posted February 7, 2019 at 06:39 AM OK sure. But $$$$ 1 Quote
imron Posted February 7, 2019 at 10:11 AM Author Report Posted February 7, 2019 at 10:11 AM Potential $$$$ gained versus actual $$$$ lost in opportunity costs, and while I'd like to get around to doing something like that, ETA is probably about idea number 10 on my Potential $$$$ list Quote
New Members Juraj 唐优来 Posted April 20, 2019 at 01:55 PM New Members Report Posted April 20, 2019 at 01:55 PM Hey. I am having an issue with scrolling down in the clipboard on the mac version. I can scroll down in the panels on the left but I can't in the clipboard panel. Thanks in advance. Quote
imron Posted April 21, 2019 at 10:56 AM Author Report Posted April 21, 2019 at 10:56 AM Are you able to post a screenshot of what you mean? Also, which version of macOS are you running? Quote
New Members Juraj 唐优来 Posted April 22, 2019 at 01:12 PM New Members Report Posted April 22, 2019 at 01:12 PM I am using high sierra (10.13.6). The yellow arrows indicate parts where I am able to scroll, red arrow indicates part where I am unable to scroll. Thank you. Quote
imron Posted April 22, 2019 at 02:40 PM Author Report Posted April 22, 2019 at 02:40 PM Thanks, I'll look in to it. Quote
New Members Juraj 唐优来 Posted April 23, 2019 at 11:48 AM New Members Report Posted April 23, 2019 at 11:48 AM Hey!After installing the new security update, its working again. Thanks for your prompt reply and sorry for any inconvenience! 1 Quote
imron Posted April 24, 2019 at 12:14 AM Author Report Posted April 24, 2019 at 12:14 AM No worries. Thanks for taking the time to raise the issue. Quote
murrayjames Posted June 26, 2019 at 07:17 AM Report Posted June 26, 2019 at 07:17 AM imron, what is the best indicator of the difficulty of a text in CTA, if you've never uploaded a list of your Known Words? Is it the number of unique words/unique characters in the text? The HSK percentages? Quote
imron Posted June 26, 2019 at 09:35 AM Author Report Posted June 26, 2019 at 09:35 AM Not the HSK percentages. They are there mostly to show how the HSK is not that useful The number of unique words is one potential indicator of difficulty, but I'd also look at the number of words it takes to get to 98% comprehension of the text and see how big a proportion of total words that is, and I'd also look at what percentage comprehension you get if you learnt every word that appeared more than once. That gives you an idea of how many words you'd need to know/learn in order to read the book comfortably. 2 Quote
murrayjames Posted June 26, 2019 at 04:08 PM Report Posted June 26, 2019 at 04:08 PM A thought. Dividing the number of unique words by the total number of words gets you the percentage of unique words in a text. Dividing the other way tells you, for example, that 1 in 5 words is unique. Not sure how closely the density of unique words correlates with difficulty though. 1 Quote
roddy Posted June 26, 2019 at 05:07 PM Report Posted June 26, 2019 at 05:07 PM So when does this integrate with one of those text-to-speech APIs? I want word lists for TV shows. 1 Quote
imron Posted June 27, 2019 at 12:41 AM Author Report Posted June 27, 2019 at 12:41 AM 7 hours ago, roddy said: text-to-speech APIs? I want word lists for TV shows. Speech to text you mean? I've actually been toying with writing an application that does this, but for any language, not just Chinese (and by toying I mean I've already written a bunch of code and done test calls with the APIs and got reasonable results back). Still not sure if I have the time to make it though and if there is any demand for this kind of thing, especially as it would need to be a paid service (because Google/Microsoft charge for each API call). Quote
imron Posted June 27, 2019 at 12:57 AM Author Report Posted June 27, 2019 at 12:57 AM 8 hours ago, murrayjames said: Not sure how closely the density of unique words correlates with difficulty though. I think you'd also need to look at the frequency of those unique words in the text as a whole. If many of those unique words only appeared once or twice in total, but when combined made up a significant percentage of total words, then that would affect difficulty, because it would mean lots of words you need to put in work to learn, but that don't really lead to increased comprehension for the rest of the text. Looking at the words it takes to get 98% comprehension (or some other reasonably high percentage) serves as a decent proxy for that. Quote
roddy Posted June 27, 2019 at 09:49 AM Report Posted June 27, 2019 at 09:49 AM 9 hours ago, imron said: Speech to text you mean? That sounds more likely. Quote
murrayjames Posted July 1, 2019 at 02:24 AM Report Posted July 1, 2019 at 02:24 AM Reinstalling CTA after a hard drive crash. I made a backup of the ChineseTextAnalyser AppData folder before the crash. After reinstalling and running CTA, do I replace the new AppData folder with the old AppData folder to get my known words back? UPDATE: I did and it worked perfectly. The license copied over too! 1 Quote
imron Posted July 1, 2019 at 01:13 PM Author Report Posted July 1, 2019 at 01:13 PM Happy to have helped Quote
drungood Posted July 12, 2019 at 07:15 PM Report Posted July 12, 2019 at 07:15 PM I'm doing the 14 day trial right now. I think it's a useful piece of software and will probably buy it, but I wish the word segmentation was better since it's a core feature of the app. Shouldn't it be possible to segment a txt file with a superior but slower segmentation library, save the segmented version, and have CTA use that? 1 Quote
imron Posted July 13, 2019 at 01:56 AM Author Report Posted July 13, 2019 at 01:56 AM 5 hours ago, drungood said: but I wish the word segmentation was better since it's a core feature of the app Segmentation is always something that I've wanted to improve, and in fact have worked on implementing a bunch of different segmenters but the main issue is one of not having enough time to build something suitable - both in terms of speed, memory usage and correctness. As with everything, there are tradeoffs. Most of the problems can be solved, it's just that there's a large amount of work involved and it only returns a minor increase in correctness, and so when I have time to work on CTA it usually goes towards other features because the segmenter is ball-park level correct, and that is sufficient for what I see as the main features of the app: 1. Finding frequently occurring unknown words in a piece of text. 2. Comparing texts to see the relative difficulties. Based on tests I've done, and on my own experience, improving the segmenter doesn't have a significant improvement on those two activities. The current segmenter does mean that CTA is less useful if you are wanting to use it for precise segmentation on a sentence by sentence level. 6 hours ago, drungood said: Shouldn't it be possible to segment a txt file with a superior but slower segmentation library, save the segmented version, and have CTA use that? Solving for the general case is not that bad, it's the edge cases where things fall down. E.g. what happens if the file is several GBs? Most tools lockup. CTA on the other hand will open (and highlight text) instantly and let you scroll anywhere through the file (though statistics take a bit longer to generate). A GB of text might seem a bit extreme, but that's only about 1000 books, which is not unreasonable if generating information for a corpus or similar. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.