Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


imron

Recommended Posts

I've just uploaded a new version that should take care of the boxes appearing under Wine, and that also allows appending to the clipboard.

@realmayo, you can use appending to the clipboard to 'merge' two documents A and B like so:
Open Document A
Ctrl+A to select all text
Ctrl+C to copy it to the clipboard

Open Document B
Ctrl+A to select all text
Ctrl+Alt+C append it to the clipboard

Ctrl+V paste A + B to a new window.

Link to comment
Share on other sites

How to best deal with names?  They don't segment properly.

I guess over time I will learn each character and mark it as known.

 

Actually I'd prefer to mark them as a name and have them segmented so as to say I know them as a group (but not necessarily each character).

Link to comment
Share on other sites

Ctrl+Alt+C append it to the clipboard

 

That's excellent, works great, thanks!

 

I plan to have name detection in the segmenter and also the ability for users to mark something as a name.

 

As well as names I reckon there will always be words which don't segment properly -- until we get near-perfect machine translation. For instance, just saw now, 我二十一岁时,.... and I'm told this includes the word 岁时, which clearly it doesn't.

 

I don't see this as a flaw with the software, just an inevitable consequence of Chinese script.

 

Where I've shoved a whole book into the software and seen words like this come up multiple times, I add them to a "fuzzy" list elsewhere, and mark them 'known': the alternative would be having to look at and then dismiss these words every time I run an individual chapter of that book through the software.

Link to comment
Share on other sites

My hosting provider currently has problems with their database and the site has been down for a few hours.  I've logged a ticket with support and they're looking in to it, but they don't have an exact timeframe for when it will be fixed.  Will update here once it's working again.

  • Like 1
Link to comment
Share on other sites

Undo is on my todo list.

 

In the meantime, if you accidentally mark all words as known then there's still a way to get your old list back because the program doesn't save the list of known words to disk until it closes.

 

So, without closing the program, just make a copy of the file:

 

c:\users\<username>\AppData\Local\ChineseTextAnalyser\wordlists\known.txt


Then close the program.  Once it has been closed, you can copy your backup over this file, restart, and everything should be back to normal.

 

It's a bit of a headache but it will be addressed before (or as part of) version 1.0.0

Link to comment
Share on other sites

Would there be anyway to run it off of a usb stick? I downloaded it and used it with Wine, and then copied all of the files onto a usb stick, and it opened fine on another computer, however even though I still have more than a week left on my trial on my computer, when I opened it on the usb stick on a different computer, it told me I had no days left. I don't really mind this - I expected something like this would happen - but if I did buy a license, could I run it off a usb stick? Would I need to enter my license number each time I ran it, or only the first time?

Link to comment
Share on other sites

@Yadang, a licenced copy of the program should run fine off a USB stick because it doesn't put anything in the windows registry except an uninstall entry (which is needed to appear in the add/remove programs list).

 

That being said, there are still some caveats.  All the user data - including the list of known words, and if you purchase it, the licence file, are stored in the user's home directory.  So that information wouldn't currently be stored on the USB.

 

What this means is that although you wouldn't need to register the licence every time you ran the program, you would need to register it once for each computer you were using it on.

 

It wouldn't be too difficult to make it possible for users to create a completely self-contained app, but well-behaved apps should be using the user directory for storing user data.  If you can provide a compelling use case for why the app should be completely self-contained, I can certainly consider it.

Link to comment
Share on other sites

I was considering using it from a cloud drive (OneDrive) so I can read stuff at work with it and share the known word lists.  The app can run from anywhere, installing it several times is fine, but I have four PCs I use regularly...

Link to comment
Share on other sites

Allowing the user to configure a custom directory for wordlists sounds like quite a decent idea then.

 

You'd have to be careful though because apps write to the list when the program is closed, so if you had the program open on computer A made some changes, then opened it on computer B before closing A then B wouldn't have those changes yet.  Worse, if you then made different changes on B, closed B, and then closed A then the list would be reverted to the contents of A.

 

If that sounds acceptable as an interim solution, I can probably try and upload a version tomorrow that supports this.

 

Long-term it's probably more robust to have some sort of syncing, but that's also quite a bit more work.

 

Both of these are also different from having a fully self contained application.

Link to comment
Share on other sites

For folks wanting to use it only on a memory stick, you might always try http://0install.net/ , It might be worth imron doing it, I doubt it's very much work. But, I suppose it would depend a bit on how many people want to use it like that. Theoretically it should just work as the program doesn't seem to really do much with the profile directory.

 

Personally, I'm not sure how much use I'd get out of it. I mostly use it to export data sets to my flashcard program, so I'm not sure it would be something that I'd use.

Link to comment
Share on other sites

It might be worth imron doing it, I doubt it's very much work.

The problem is, it's not what most users are used to, and so it just adds a barrier for non-technical users and adds support effort required to address problems people will encounter.  Many of my target users will not be that computer literate, and anything that differs greatly from their standard expectations causes problems.  The experience I've had with other software such as Pinyinput and Hanzi Grids is to always try and make things as simple as possible - hence the reason Chinese Text Analyser has a single click install, automatic detection and installation of the 64bit vs the 32bit version and so on - things that most users will hopefully never even notice because it all just works smoothly.

 

If I make a self-contained app, it will just be a flag in a config file, and then making sure it stores the local user data relative to the base location rather than the user's local directory.  It's not a huge amount of work, but I also have other higher priority features to work on at the moment so just want to hear a few use cases to see what sort of priority I should allocate to it.

Link to comment
Share on other sites

Yeah I guess there is a versioning issue.  Sync is a pain.  I have the same issue with Anki when i forget to sync.

 

I guess a script on startup/exit would achieve the same effect (copy file back and forth), and I could make it create backup files as well.

Link to comment
Share on other sites

Sync is a pain.  I have the same issue with Anki when i forget to sync.

I'm not sure how Anki syncs, but when I get around to doing sync I'd probably do it as a list of additions and deletions since last sync.  This way you wouldn't be overwriting entire lists, just updating a set of changes.  The likelihood of syncing to an out of date file is therefore greatly reduced.

Link to comment
Share on other sites

  • 3 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...