imron Posted July 24, 2017 at 09:15 AM Author Report Posted July 24, 2017 at 09:15 AM It should work in the wordlist view with multiple words. At least it used it. If it doesn't then a bug has crept in. I'll investigate. Quote
Yadang Posted July 25, 2017 at 04:50 AM Report Posted July 25, 2017 at 04:50 AM I just checked it again on Linux with a document of 5 unknown words to make sure I wasn't just getting lost in the sea of unknown words, and it didn't work. I'm pretty sure it doesn't on Windows either. Quote
New Members Shalmanese Posted August 23, 2017 at 05:47 PM New Members Report Posted August 23, 2017 at 05:47 PM Three feature requests for the next version: 1. I'd like the ability to drop a .txt file onto the dock icon/window and have it open and I'd like for it to tell the OS that it can read .txt files so I can right click->open with->CTA. Right now, the File->Open menu is the only way to load a text file and it's clunky compared to drag and drop. 2. After adding a custom word and then clicking "Show Definition", the popup just says "no definition". Instead, can it show the pinyin of the word? I add names as custom words and then I forget how to pronounce them. Before making it a custom word, I could click on each letter and sound it out but once it's turned into a custom word, I need to copy and paste it into Google Translate just to get the Pinyin. 3. When using flux, the colors are really hard to tell apart. In the light scheme, the color for looked up words looks almost identical to known words and in dark mode, the looked up words are almost invisible. Looking forward to the next version! 1 Quote
imron Posted August 24, 2017 at 02:41 AM Author Report Posted August 24, 2017 at 02:41 AM 1 is easy and can be in the next version 2 I'll add to my todo list 3 You can already change, but you have to edit a text file: windows: /Users/<username>/AppData/Local/ChineseTextAnalyser/colour-schemes/default.colours macos: /Users/<username>/Library/Application Support/ChineseTextAnalyser/colour-schemes/default.colours linux: /home/<username>/.local/share/ChineseTextAnalyser/colour-schemes/default.colours The file is just a list of key=value pairs. The keys should be fairly self-explanatory and the values are hex rgb values but without the # sign at the front. 1 Quote
imron Posted August 24, 2017 at 02:37 PM Author Report Posted August 24, 2017 at 02:37 PM On 7/25/2017 at 0:50 PM, Yadang said: I just checked it again on Linux with a document of 5 unknown words to make sure I wasn't just getting lost in the sea of unknown words, and it didn't work. I'm pretty sure it doesn't on Windows either. @Yadang I just checked this, and it works correctly on windows, linux and macos. What version of Linux are you running? Quote
imron Posted August 26, 2017 at 05:39 AM Author Report Posted August 26, 2017 at 05:39 AM Moved question about scripting here. Quote
Yadang Posted August 29, 2017 at 05:04 AM Report Posted August 29, 2017 at 05:04 AM On 8/24/2017 at 8:37 AM, imron said: What version of Linux are you running? lsb_release -a gives me: Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty Is that what you're looking for? Note that I'm running it on a chromebook with crouton. A lot of things don't work the way they're supposed to. As for windows, I was using windows xp when I encountered the problem. Quote
laurenth Posted September 6, 2017 at 08:28 PM Report Posted September 6, 2017 at 08:28 PM This silly question is probably answered somewhere in the depth of this thread, but... I'm changing computer. How do I transfer my current "know word" list, and other settings, to the new machine? Thanks. Quote
imron Posted September 7, 2017 at 03:22 AM Author Report Posted September 7, 2017 at 03:22 AM The ChineseTextAnalyser data directory can be found here: Windows: c:\users\<username>\AppData\Local\ChineseTextAnalyser\ macOS: /Users/<username>/Library/Application Support/ChineseTextAnalyser/ Linux: /home/<username>/.local/share/ChineseTextAnalyser/ And you can just copy the whole thing to the same location on the new computer. If you are changing operating systems, some config options such as remembering size and positions of windows will not be preserved. You can also just copy specific sub-directories within that directory e.g. wordlists or colour-schemes to get just those things. Custom words you have specified can be found in data/words.u8 1 Quote
Yadang Posted December 15, 2017 at 11:24 PM Report Posted December 15, 2017 at 11:24 PM Might have asked this before, but can't find it... Is there a way to import words and only make them be added as custom words, but not marked as known? Or even - if a list of words is added with the import feature, are they added as custom words if there's no dictionary entry that matches? If so, could I just import them by list, then copy and paste the document and remark them all as unknown? Quote
imron Posted December 16, 2017 at 12:48 AM Author Report Posted December 16, 2017 at 12:48 AM There's no way to do this from the user interface, but you can do it by manually editing files. 1. Close CTA if it is already open 2. Go to the CTA data directory (macOS: ~/Library/Application Support/ChineseTextAnalyser/data/, windows: C:\Users\<username>\AppData\Local\ChineseTextAnalyser\data\, linux: ~/.local/share/ChineseTextAnalyser/data/) 3. Open the file called words.u8 (or create it if it doesn't exist. This should be a plain text file in utf-8 format) 4. Paste custom words to the end of the file - one word per line 5. Save the file and close 6. Re-open CTA and enjoy all your custom words not yet marked as known. 1 Quote
mlescano Posted January 4, 2018 at 02:03 AM Report Posted January 4, 2018 at 02:03 AM Hi. I read somewhere that a new version was going to have a more accurate segmenter. Is this the case now? Also, what if some words I already know (and which I import as a wordlist) are not in the CEDICT dictionary? Will CTA fail to recognize them since they're not in the dictionary? Thanks! 1 Quote
imron Posted January 4, 2018 at 07:06 AM Author Report Posted January 4, 2018 at 07:06 AM You read it right in this thread from a previous post of mine. I was actively working on it and the results weren't as good as expected because the statistical information it relied on would overmatch words, and a large amount of those overmatched words didn't exist in the dictionary so doing a dictionary lookup on many words would just result in a 'no definition' definition. There are a number of ways to solve that problem but, I got caught up with a bunch of other work and haven't gotten around to doing that yet. Words that you have added as custom words will still be matched by the newer segmenter, CTA will look at the words you've added and give them a statistical bias. 2 Quote
Arlo_ Posted January 8, 2018 at 09:22 PM Report Posted January 8, 2018 at 09:22 PM I've just started with CTA - looks great so far. Due to my dodgy colour vision, the colours for known and unknown words in the text view are almost indistinguishable to me. In your post of 24 Aug 2017 you said the colours could be changed by editing this file: windows: /Users/<username>/AppData/Local/ChineseTextAnalyser/colour-schemes/default.colours I can't find that file. The only folders in AppData/Local/ChineseTextAnalyser are clipboard, data, logs and wordlists. Is it still possible to change the colours in the light scheme? Some colours are much easier for me to distinguish than others. Thanks 1 Quote
imron Posted January 8, 2018 at 10:13 PM Author Report Posted January 8, 2018 at 10:13 PM What version of CTA are you using? That file should still be there, but it might not be created on disk until you run CTA for the first time and then quit the program. When you come up with a suitable set of colours can you let me know and I'll include them in the main program for other people who face similar issues. Quote
Arlo_ Posted January 9, 2018 at 05:31 PM Report Posted January 9, 2018 at 05:31 PM Thanks. As you said, the file appeared after I closed and reopened CTA. After some experiments I found that #0072BC (RGB 0, 114, 188) worked well for unknown.foreground. For me it is easily distinguishable from the colours for known words and hover/looked up words, but still dark enough to read easily. So far the other colours seem fine. If I have any more issues I will let you know and suggest alternatives. BTW my kind of red-green colour blindness is one of the most common types, so if you are able to cover that in the main program as you said, I'm sure that would be very helpful for others. I'm using version 0.99.16 - 64 bit, which I recently downloaded. This program is definitely going to be very helpful. Thanks again. 1 Quote
somethingfunny Posted March 7, 2018 at 08:43 AM Report Posted March 7, 2018 at 08:43 AM Forgive me if this has been asked before, but is there an English language equivalent of CTA? Quote
imron Posted March 7, 2018 at 09:48 AM Author Report Posted March 7, 2018 at 09:48 AM It has been asked before, and I haven't made an English language version (yet), and I don't know if there is anything equivalent. Quote
uvwxyz Posted September 11, 2018 at 03:29 AM Report Posted September 11, 2018 at 03:29 AM @imron, is there any update on improving word segmentation in CTA? I recall you said something a couple of years ago about planning to improve it from current somewhat "hit and miss" state. I try using it about once a month and put it back with a sigh, after seeing "unknown words" it produces. An example from today: 没有最大限度地利 用已有的资源。 is segmented as 没有/最/大限/度/地利/用/已/有的/资源。 Asking as a paying customer who resorted to running a Windows VM in order to use a better-working (and free btw) software. There are free(!) public(!) segmenters on Github IIRC ready for copy pasting or at least re-implementing in the language of your choice. Quote
imron Posted September 11, 2018 at 05:31 AM Author Report Posted September 11, 2018 at 05:31 AM 1 hour ago, uvwxyz said: There are free(!) public(!) segmenters on Github IIRC ready for copy pasting That are not as fast. 1 hour ago, uvwxyz said: or at least re-implementing in the language of your choice. Which I did, and got it to a working state with acceptable performance, but found that the results were also just as much hit and miss because the segmenter was overbroad - meaning it would rate things as words that were really phrases, and that then had an impact on looking things up in the dictionary because many of the hits were on non-dictionary words that returned no results. There are ways to fix this, and I have investigated some of them, and then life and work got in the way and I haven't had time to get back in to things. 2 hours ago, uvwxyz said: Asking as a paying customer who resorted to running a Windows VM in order to use a better-working (and free btw) software If you don't mind me asking, which software? Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.