Jump to content
Chinese-Forums
  • Sign Up

Free graded reader resource


Recommended Posts

  • New Members
Posted

Hello everyone,

Inspired by the Inkstone app that I found was so useful and free, I dedicated some of my time here in China to create a graded reader website, IronMandarin.

 

What you can do with this website:

  • read a text
  • choose a text by category or by HSK level
  • see how close a text really match a given HSK level
  • save words in your know list, to give you an idea of where YOU stand to a given text (this is your personal profile)
  • save words in a 'to learn' list, and I plan to add a SRS (Spaced Repetition Software) functionality, or maybe just allow export in a convenient format for anki import?
  • see the most frequent unknown words in a text (so once you add 的,一,个... to your personal list they don't appear in the frequency listing anymore)
  • switch the character set from simplified to traditional
  • analyze your own text:
    • without logging in, you can analyze a text, for example an email or an article, it will be better for you than google translation
    • logging in, you can publish a text so it is saved on the website. It can be public, or private (so no sharing say, personal emails)

 

It is not fully developed, as I also study and work, but more functionalities are planned, such as frequency list over one category or a set of articles, to be able to focus on specific vocabulary, which I find especially useful for HSK 6.

 

I publish text from different sources but I know a few Chinese teachers here in Chengdu that help me. You can also participate by publishing some texts.

 

The segmentation algorithm is automated, based on Jieba, but it makes a lot of mistakes and currently I spend quite some time reviewing the segmentation, maybe I'll have a look at the code to patch a few common mistakes (for example it doesn't split numbers, or number and measure word).

 

I had before some question about monetization. I dedicate quite some time to this project, and I need to pay the writers more if they spend some more time on the project.

I will make pretty soon a Patreon page, hopefully it will be enough to make this project sustainable.

The website also offers the possibility of tailored analysis and advice for a reader to progress, based on his current word list. This and some premium content could make for a premium membership in the future if a Patreon is not enough.

The website is not free to gain enough traction to put it full price more expensive than a real newspaper. The core functionalities intend to remain free, on a donation or freemium model.

 

Hopefully this will help all of us Chinese learners :)

Let me know if you have any suggestions!

  • Like 1
Posted
5 hours ago, IronMandarin said:

maybe I'll have a look at the code to patch a few common mistakes

Jieba is a statistics based segmenter.  It's not so much the code you need to patch but rather the probabilities used in the statistical model.  You could probably hard-code a bunch of different exceptions, but the whole point of using a statistical model is to avoid the need to hard-code exceptions in the first place.

 

For what it's worth, I'm currently working on a statistical segmenter for Chinese Text Analyser that uses the Jieba data files for probabilities (the current version of CTA uses a first longest match algorithm, which is fast but even more inaccurate than Jieba).

  • New Members
Posted

Ok, thanks for the information I'll have a look into that. It was not my priority but that could be useful to dig a bit into the technique.

Posted

Let me know if you have any questions about it, or can't figure out why it does something in a given way.  I've been going over it in detail recently so have a good idea of how most of it works.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...