dougwar Posted August 26, 2017 at 02:58 PM Report Share Posted August 26, 2017 at 02:58 PM Yes, would be great. Quote Link to comment Share on other sites More sharing options...
dougwar Posted August 26, 2017 at 03:10 PM Report Share Posted August 26, 2017 at 03:10 PM if I'm not asking for too much, this is my workflow maybe you know a way to automated it; 1. First a get a text file, I make one sentence per line 2. I run highlight-unknown.lua script 3. In excel I ordinate a row by numbers of known words. 4.I make some formulas to have this rows, than I export it to Anki; ex. Sentence with blank Unknown word Complete sentence. 那个学生昨天 __ 了书 丢 那个学生昨天 _丢_ 了书。 。 Quote Link to comment Share on other sites More sharing options...
imron Posted August 28, 2017 at 09:04 AM Author Report Share Posted August 28, 2017 at 09:04 AM Yep, that can be completely automated. The script could also be made to export sentences in Anki's 'cloze' format if you'd like. Quote Link to comment Share on other sites More sharing options...
RogerGe Posted September 1, 2017 at 12:43 PM Report Share Posted September 1, 2017 at 12:43 PM Imron, is it possible to change the script from checking the percentage of new characters based on HSK6 to it being based on your current word list? Thanks Quote Link to comment Share on other sites More sharing options...
imron Posted September 4, 2017 at 03:01 AM Author Report Share Posted September 4, 2017 at 03:01 AM Yes, it's very easy to do. On line 11 of the file, change for word in cta.hskLevel( lower, upper ):words() do to for word in cta.knownWords():words() do This will then build the list of characters from your known vocabulary rather than the HSK 1-6 vocabulary. Attached is a copy of the script with that modification (plus a few cosmetic changes to remove reference to HSK from variable names and output). char-coverage-known.lua 1 Quote Link to comment Share on other sites More sharing options...
imron Posted September 4, 2017 at 06:17 AM Author Report Share Posted September 4, 2017 at 06:17 AM @dougwar please find attached a script that should do what you want. It finds all the sentences in a given document that contain unknown words. Then it sorts those sentences by the number of unknown words, with sentences containing the least amount of unknown words appearing first Then for each unknown word in each sentence it prints The total number of unknown words in the sentence The sentence with the current unknown word replaced with __ The unknown word The sentence with the word surrounded by __ e.g. _生词_ This means that each unknown word in the sentence will have its own line in the output, so if the sentence has 5 unknown words, that sentence will appear 5 times in the output with a different word replaced each time. You should then be able to save this file and import it directly in to Anki. Let me know if this does what you want, or if you need any adjustments. unknown-sentences.lua 1 1 Quote Link to comment Share on other sites More sharing options...
imron Posted May 2, 2018 at 08:08 AM Author Report Share Posted May 2, 2018 at 08:08 AM Uploading a script that extracts a marked word from the first field of a tab separated file (e.g. from cards exported by anki) extract-marked-words.lua See here for context. See here for instructions on how to run the script. 1 1 Quote Link to comment Share on other sites More sharing options...
imron Posted March 29, 2020 at 05:10 AM Author Report Share Posted March 29, 2020 at 05:10 AM Uploading a script that finds all unknown characters in a document, and prints them out in order of frequency (highest to lowest). An 'unknown' character is defined as a character that does not exist in any of your known words. unknown-chars.lua 4 Quote Link to comment Share on other sites More sharing options...
Jan Finster Posted March 29, 2020 at 08:11 AM Report Share Posted March 29, 2020 at 08:11 AM Thanks Imron :)) It worked once. The second time I tried, I got this: Quote Link to comment Share on other sites More sharing options...
imron Posted March 29, 2020 at 08:43 AM Author Report Share Posted March 29, 2020 at 08:43 AM You're running a different script "subs2anki.lua" and it is likely expecting a file in a different input format. Quote Link to comment Share on other sites More sharing options...
Jan Finster Posted March 29, 2020 at 08:53 AM Report Share Posted March 29, 2020 at 08:53 AM 9 minutes ago, imron said: You're running a different script "subs2anki.lua" and it is likely expecting a file in a different input format. ? Thanks. It works perfectly ? 1 Quote Link to comment Share on other sites More sharing options...
dougwar Posted August 18, 2020 at 09:34 PM Report Share Posted August 18, 2020 at 09:34 PM Hi is it possible to make a script to read a directory with several files and output the % of know words in each file? Quote Link to comment Share on other sites More sharing options...
imron Posted August 19, 2020 at 03:51 AM Author Report Share Posted August 19, 2020 at 03:51 AM Yes, it’s possible. What sort of format would you expect the output to take? 1 1 Quote Link to comment Share on other sites More sharing options...
dougwar Posted August 19, 2020 at 12:40 PM Report Share Posted August 19, 2020 at 12:40 PM 8 hours ago, imron said: Yes, it’s possible. What sort of format would you expect the output to take? I'm thinking something like this: File name - Total Words - % Know Total - Total Unique - % Know Unique - Character Unique I have a data big data base of books, I want to generate a list of readability like you made in your website to guide me to choose which book to read next. Thanks in advance Quote Link to comment Share on other sites More sharing options...
dougwar Posted August 25, 2020 at 06:25 PM Report Share Posted August 25, 2020 at 06:25 PM I did a script that read all files from a directory and show the % os know words in each files example; File Name ; Total Words ; Know Words ; % Know % 1984.txt ; 96408 ; 65274 ; 67 % 1Q84.txt ; 98991 ; 64990 ; 65 % 1Q84BOOK2.txt ; 85398 ; 55843 ; 65 % 1Q84BOOK3.txt ; 113353 ; 74226 ; 65 % the script is in the attachment. If you have thousands of books its a good tool to search more easily what is in your level to read. Percent Know Words Directory.lua 2 Quote Link to comment Share on other sites More sharing options...
yaokong Posted February 20, 2022 at 09:16 AM Report Share Posted February 20, 2022 at 09:16 AM Thanks, dougwar, for that script, I ran it through my library and found several potential books to read. I had no idea they were at my level. Hi @imron, would it be possible to extend dougwar's script with the number of unique words and known unique words? Could you potentially give me a few pointers at what functions to look at? Not that I know programming, but I am willing to go to great length to find JUST the right book(s) to read... ? The reason I am trying to do this is that I found that "Total Percent Known" by itself is not really enough to judge if I can read a book without frequently needing a dictionary. I can memorize some checked words if they appear a couple times, but obviously not if the vast majority of the unique words are unknown to me. To illustrate with an example of two extremes: While "Total Percent Known" might be at say 96% (my level of《大智度論》now), the remaining 4% might still contain an immense number of unknown words (almost 60% or unique words are unknown to me in the same book). On the other hand simpler texts at the same 96% total percent known level might only have a few hundred unknown unique words (such as 论确实性 On Certainty by Wittgenstein, where only around 20% of unique words are unknown to me). I am aware that I cannot read 《大智度論》 without a dictionary (and a good teacher ?), so I would like to find easy reads for my spare time (aka not Wittgenstein ?). I so enjoy not checking the dictionary at all or just a few times, it makes me feel far more immersed in the book. 1 Quote Link to comment Share on other sites More sharing options...
imron Posted February 20, 2022 at 11:14 AM Author Report Share Posted February 20, 2022 at 11:14 AM Can you remind me in a week if I haven’t gotten back to you by then? Thanks. 1 Quote Link to comment Share on other sites More sharing options...
yaokong Posted March 29, 2022 at 01:34 PM Report Share Posted March 29, 2022 at 01:34 PM On 2/20/2022 at 7:14 PM, imron said: remind me in a week Apologies for the late reply, I tried to use a workaround, had an almost working AutoHotKey script, then Windows died irreparably, so here I am (this time on Linux) without a good solution for this. Could you please help me get started? Quote Link to comment Share on other sites More sharing options...
imron Posted March 30, 2022 at 06:59 AM Author Report Share Posted March 30, 2022 at 06:59 AM Here you go. As a bonus, it should also run considerably faster than the previous script because I'm calling out to get the stats from the CTA engine, rather than totaling manually inside the lua script. known-words.lua 1 Quote Link to comment Share on other sites More sharing options...
yaokong Posted March 31, 2022 at 03:32 PM Report Share Posted March 31, 2022 at 03:32 PM Amazing stuff, thank you so much! I recommend you include it with the program, this is immensely useful. Previously I had an AHK script that went through my Calibre book catalog, opened the books one by one in CTA, waited for the doc to load by checking the status bar, then clicked on the stats one by one, copied and saved them, went back to calibre, opened the custom metadata view, and pasted the stats one by one ? -- not very fast as you can imagine. After a lot of refinement I could get my script to run at around 10 books per minute (depending on file size). With your LUA script I just processed 161 books in 11 seconds (almost 100 times faster than my method). Now I just have to find a way to incorporate the info back into Calibre. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.