Jump to content
Chinese-Forums
  • Sign Up

Transcrobes: Free + Open Source language learning platform (for Mandarin)


Recommended Posts

Posted

Update: (11-minute) https://www.youtube.com/watch?v=UodaDI4XVf0 Video introduction and demo to main functions. The system and interfaces have been improved quite a bit since filming this but everything there is still relevant.

 

I am an experienced software engineer who took a year off to learn Chinese, found it was much harder than it should be, and had an idea for a platform to make learning Chinese (and later any language) hopefully much easier and more fun.

 

Instead of wanting to start a company, I decided to do a PhD (at the City University of Hong Kong) instead and make everything available as open source/free. The information site is https://transcrob.es and the open source project is on Github https://github.com/transcrobes/transcrobes.

 

The basic idea is that learners should start doing real stuff with real language as soon as possible, and that if we tie all the available tools in together (readers, video players, spaced repetition, browser plugins, data analytics, etc.), we can get a lot of insight, so every learner can have a deeply personalised language learning journey. Learning happens best when you care about what you are learning (with), so content should be chosen by learners not teachers (or the government!). Computers allow us to automatically make virtually any content accessible to anyone advanced beginner/intermediate and above (say 600-700+ characters). From there we just pile on the science (and engineering) to make the learning as efficient as possible! Beginners can use the spaced repetition and dictionary functionality to learn vocab and then move to the content side when they start feeling more comfortable.

 

The software works in the browser (Chrome-compatible browsers for the moment, others to come), so it is not something you download and install separately. You need to initialise some data into the browser, so a lot of the functionality is available offline (PWA) after "install".

 

As part of my PhD research, learners can sign up to participate and use the software https://am.transcrob.es. At the very least the current service will be available for another 2 years.

 

Anyone 18 or over (university research restrictions...) can sign up and use the service for learning (simplified) Mandarin Chinese, as much or little as they like (for their own personal learning). I will ask participants to answer some surveys/questionnaires to understand how they use/like the software (but you can ignore me and it will only make me sad :-< ).

 

I am continuing to develop new functionalities, and will be increasingly using more sophisticated machine learning as time goes by. Learner input into what they like or want is very important, and any new ideas will be considered for implementation! I am using the system myself for learning Chinese, so I am one learner I have been listening to a lot so far :-).

 

Let me know here (or via direct email anton _at_ transcrob.es) if you have any questions or comments, and you are more than welcome to sign up and start using it now!

 

Here are some (quite long-winded) introductions for various levels of learner:

 

Getting started instructions for beginners: https://transcrob.es/page/software/learn/getting-started-beginner/

Getting started instructions for advanced beginners/intermediate: https://transcrob.es/page/software/learn/getting-started-intermediate/

Posted

Scanning your post and having a quick look at your website (and I mean quick), it looks like you might be doing something cool, and something I'd be interested in being involved in.

 

I will say that the amount of reading required to have any real idea of what you're doing is putting me off. I love reading but I don't want to have to read thousands of words just to know if it's worth my time to even look into. 

 

A few images and maybe a bullet list or two might make this more appealing.

 

 

Posted

An update for anyone else interested:

 

I signed up for an account (without reading anything in detail). The "initialising" process for the web app takes a few minutes and is somehow exciting because websites almost never need to initialise. 

 

Going into the site (web app, but I'll call it a site), the UI is clean and seems to have been thoughtfully done and seems to have the potential to be something good. I guess I will have to do some reading and playing around to figure out what to do next. 

Posted

After signing up, reading a little bit, and going through the site, here's what I've got:

 

This looks like it's intended to be an open source LingQ:

  • Read texts and look up words with a single click.
  • It has a built in Anki (SRS flashcards) too, with stroke order for characters
  • It seems like it's intended to automatically suggest texts based on what you know, right now only based on vocab but later based on grammatical complexity too.

One thing is that it doesn't seem to have any content, so I couldn't see what the actual reader is like, or if it even exists yet. Mostly what there is is a menu, with names obscured by a "obes" suffix that seems to have been added to every menu item. 

 

Since I'm not a fan of LingQ, I think this is a nice idea if it's excecuted well. 

  • Like 2
Posted
On 1/13/2022 at 5:02 PM, Moshen said:

And best if you can say the above in language that doesn't sound like a bunch of hot air or like everyone else's language-learning pitch.  This is hard - but essential.

 

I would love to do that but no matter how hard I try, it is either long, hot air or both! The problem is that the software already does lots because it should follow you everywhere. It should provide both content and active learning (exercises). Most people seem to want to push single-use stuff they can market and monetise - my interest is something people can use over several years, evolving with them as their language competencies develop in many different contexts. That means not just one exam, but all exams and then further study, work or life in a language. So how to pitch that? Tools for living in a language? That already sounds like hot air! ? 

Posted
On 1/13/2022 at 5:07 PM, markhavemann said:

One thing is that it doesn't seem to have any content, so I couldn't see what the actual reader is like, or if it even exists yet. Mostly what there is is a menu, with names obscured by a "obes" suffix that seems to have been added to every menu item. 

 

It is currently BYOC (bring your own content). The browser plugin (also requiring a 10 minute "install" unfortunately...) allows you to read most webpages though. I was working with the Language Reactor/Language learning for Netflix guys for a while but the pandemic made that hard and Netflix doesn't have lots of Chinese content yet.

 

Any epub (so novels, etc.) should work (it's a bug otherwise), so any of the sites you can get DRM-free epubs (or make them yourself...) should work. I am a little hamstrung being associated with a university, as some of the sites that provide free content might not be only providing freely downloadable content, so I can't include those unfortunately...

 

I hope (it's all open source) to be able to integrate into content platforms later but that will only ever be possible if there are lots of users.

Posted
On 1/13/2022 at 5:07 PM, markhavemann said:

This looks like it's intended to be an open source LingQ:

 

There are a few sites/apps that do *some* similar stuff - the thing is that I want to have an entire *open* ecosystem, that is for *real* learners, but also has a research focus. There should never be any lock-in - I think that is the absolute opposite of what education should be (I'm not nearly so fussed about it elsewhere, just in education...), so I didn't see how this could be done how it should be otherwise.

 

One aspect is that the system builds a learner model (a representation of the language knowledge of the user/learner), and then makes that model available to the learner, so they can see where they are, how they got there and how they are evolving. The learner model also serves for creating enrichment/comprehension aids, and also for planning and doing vocab (and later grammar) work.

 

There are a LOT of complex theoretical aspects to the project (technical and "waffly, social science stuff"). For example, one major driver is (a variant of) Zoltan Dornyei's "Imagined Future L2 Self" for allowing rich, learner-driven L2 identity development. I'm pretty sure the LingQ guys aren't on that level though!

  • Like 2
Posted

sounds like the core function is to evaluate texts to see if its suitable. I skimmed your "get started as intermediate" page and its far too verbose. I know you don't have unlimited resources to hire copywriters and graphic designers though.

 

But pictures really are a 1000 words. I'm imagining 2 panels

Left: the text has a ton of red unknown words, and the person is confused, shaking their head

Right: The text only has a few unknown words, and the person begins reading the text as its at suitable level

 

Also can't find a list of features of the software. Should probably be somewhere on the site

 

 

Posted

So, I read all posts on this thread and still have no clue what the programme does ?

Is it similar to Lingq or Chinese text analyzer, both or neither of them?

 

@AntonOfTheWoods could you please give us 4-5 bullet points what it does and how it compares to existing programmes or apps? Ph.D. or not, you still need to be able to elevator-pitch your idea if you want to create a following/user community.

 

(Since I was never able to successfully install anything from Github (shame on my ignorance) this is likely going to be a non-starter anyway for me)

Posted
Quote

I would love to do that but no matter how hard I try,

 

Let's see you try!  If you do your best to answer my four questions above *briefly*, it would go a long way to provide some clarity.  Don't just complain - try.

 

If you find yourself going on too long, look at what you wrote and ask yourself, which is the essential point? and delete everything else.

Posted
  • What does the program allow one to do?

Learn vocabulary through personally optimised spaced repetition. Consume any authentic Chinese content from an advanced beginner level and above, optimised for learning *and* easy understanding. Closely monitor and evaluate one's own level and development through setting goals.

  • Who is it for?

Literate, teenage and above learners of Chinese who want to develop language skills to let them function (study, work, live) with Chinese as the main form of communication.

  • How is it different from other learning paths?

There is a strong focus on autonomous learning, allowing learners to discover a language/culture on their own terms but using the latest technologies to scaffold/structure learning so learners have the support to explore without getting lost.

  • What is the experience of going through your program like?

As a beginner, starting out is like with many spaced repetition programs. When you have a certain amount of vocab in the system you can start consuming *any* authentic content you are personally interested in, in a way optimised for learning. If you already know lots of words, there are many ways to quickly teach the system what you know.

 

How is that?

  • Like 2
Posted

@Jan Finster

The main point is that learners should start consuming authentic content they are interested in as early as possible, but there was nothing to do that easily before. That is possible to do if the system knows what you know and the system gives you help in place, so you can consume content much like in your native language. But there is no reason to stop there. There are lots of other tools and techniques (like spaced repetition, close, drills, text selection, etc) that are valuable for many learners. When you put them all together, and let learners mix and match as they see best, all connected to a single data model, you can do a lot of cool stuff.

 

GitHub is for the developers. Everything is browser-based, you just go to the site, sign up and click (and wait a bit). There is also a Chrome extension for reading content on other websites, using exactly the same help you get on the main site (https://am.transcrob.es). That is also a click, but you also need to configure the extension with your login details and the main URL (https://am.transcrob.es)

  • Like 1
Posted

One thing that is obviously not clear to most people (probably because it is new) is that you shouldn't care whether some system or teacher thinks a text is the right level for you. We are now at a point where computers can give enough help for most advanced beginners+ to understand almost *any* content (that you would understand in your native language!). That means you can always choose content you are interested in. The less you know the harder it gets for the system, so the system can certainly give you an indication of that. Transcrobes currently gives percentages of known words and characters.

 

But at the end of the day, completely "language external" factors (hunger, ambiant noise/lighting, tiredness, topic knowledge,...) may well have a much more significant effect than the things the system has access to for determining in a given situation whether the learner will get significant value from a text or not. There is *zero* convincing science on this, and I have looked high and low.

 

So you should choose texts first and foremost on whether you are interested in the contents, not on some (likely inaccurate) analysis of text complexity, particularly if the system doesn't have a *very* accurate picture of the current state of your knowledge (so that is 100% of existing software)!

  • Like 1
Posted
Quote

We are now at a point where computers can give enough help for most advanced beginners+ to understand almost *any* content (that you would understand in your native language!). That means you can always choose content you are interested in.

 

I strongly disagree.  The other day I was really interested to read an article in Chinese Wikipedia on a certain topic, and I started it, but I had to look up so many words (each with just one click) that it turned into an unpleasant chore, regardless of how motivated I was to learn about that topic.  It was just too hard, and I don't believe any fancy tech could have erased that fact for me.   (I am at HSK 4.5, if that helps put this in context.)  I find it hard to believe that you can develop a magic bullet to overcome this learning obstacle.  Your claim simply makes no sense whatsoever.  And if I'm wrong about that, then please go ahead and explain why my thinking is wrong.

  • Like 1
Posted
Quote

Learn vocabulary through personally optimised spaced repetition. Consume any authentic Chinese content from an advanced beginner level and above, optimised for learning *and* easy understanding. Closely monitor and evaluate one's own level and development through setting goals.

 

This is moderately understandable, though an advanced beginner probably would not know what you mean by "personally optimised spaced repetition."

 

Quote

Literate, teenage and above learners of Chinese ...

 

I don't understand what this means.  (Grammatically it's a bit garbled.)

 

Quote

...who want to develop language skills to let them function (study, work, live) with Chinese as the main form of communication.

 

This surprises me very much.  Your thing would not be useful for those who simply want to learn the language but not "as the main form of communication"?

 

Quote

There is a strong focus on autonomous learning, allowing learners to discover a language/culture on their own terms but using the latest technologies to scaffold/structure learning so learners have the support to explore without getting lost.

 

I understand this, but I'm skeptical, as explained in the previous post.

 

Quote

As a beginner, starting out is like with many spaced repetition programs.

 

You must keep in mind that many potential users will not have encountered ANY spaced repetition programs yet.  I learned four other languages without any spaced repetition anything, not even flash cards.

 

Posted

So it's something like opensource lingq and Chinese Text analyzer put together? ?

 

 

By the way, what research do you base this on?

On 1/13/2022 at 10:30 AM, AntonOfTheWoods said:

The basic idea is that learners should start doing real stuff with real language as soon as possible, and that if we tie all the available tools in together (readers, video players, spaced repetition, browser plugins, data analytics, etc.), we can get a lot of insight, so every learner can have a deeply personalised language learning journey. Learning happens best when you care about what you are learning (with), so content should be chosen by learners not teachers (or the government!).

 

You basically make three big claims here:

 

1. Learners should start doing real stuff with real language as soon as possible
2. Learning happens best when you care about what you are learning, therefore content should be chosen by learners

3. With your tool every learner can have a deeply personalised language learning journey

 

This being PhD research, are the first two premises for a conclusion in the third, or is there a research question in the third? What are the research questions?

These statements must be based on some previous research? Are there references to these or a reading list somewhere? I'm just asking because I got curious if there are some kinds of studies that have compared "doing real stuff" soon versus later etc.

  • Like 1
Posted

OK, I bit.

 

 

Here is the user interface. I am not sure what this is all about, but I was happy to play along with it.

OK, the interface is dull and there is no easy introduction.

 

 

I cannot enter any search terms here. Why?

 

image.thumb.png.c03ea4956a7134da22a7cb11a2d3f79b.png

 

So, I tried to import content. Word document does not work. So, OK I tried .txt. Still does not work:

image.thumb.png.b4237c57aa88b63e40938d0b8e251b5e.png

 

Listrobes: whatever this is...I guess can choose if I know a word. Do, I really need to click through ~8000 words that I know manually. Why is there no "mark all as known" button?

 

image.thumb.png.7b43a4edead6177f77e6a6f79ddac197.png

 

I still have not found the chrome extension. Not on the website, not on playstore, not on google, not on the chrome-webstore....

 

 

I have not found how I can uninstall all this (whatever took 20 minutes to install).

 

?

 

Posted
On 1/14/2022 at 12:43 AM, alantin said:

This being PhD research, are the first two premises for a conclusion in the third, or is there a research question in the third? What are the research questions?

These statements must be based on some previous research? Are there references to these or a reading list somewhere? I'm just asking because I got curious if there are some kinds of studies that have compared "doing real stuff" soon versus later etc.

Here is an earlier form of the research proposal https://transcrob.es/docs/lt8805_melser.pdf 

 

A lot of the implementation details and what I hoped would be possible for the experiments have changed but the theory is outlined there. Actually virtually none of the stuff I claim is either new or particularly controversial, though there are schools of thought that disagree. Unfortunately, very few scholars (none basically, except maybe Detmar Meurers) think it worth it to actually test out what they claim in meaningful learning contexts. They think that doing a pre-test, post-test and 4-week posttest after an hour-long experiment is supposed to give us some insight as to how languages get learnt.

 

The "real stuff soon" is an elaboration of the work of Stephen Krashen, who very deeply affected the discipline from the 70s onwards and gave rise to several entire branches of scholarship in applied linguistics. A synthesis of the "novel" theoretical contribution I make can be found here: https://transcrob.es/page/meaningful-io/intro/ 

  • Helpful 1
Posted

@Jan Finster

It can take a second or two for the system to be ready to accept input. I thought about a spinner but it is usually pretty instant. If not that is a bug and I'm interested!

 

Actually there are links to documentation specific to each screen in the top right corner of almost all screens. Those docs have lots of screenshots and should get directly to the point (otherwise it's a bug).

 

In terms of the imports, again there is some specific documentation linked to, that has docs on what can be imported now. Importing different formats is an herculian task - it can take weeks just to get a single format right. I want to support lots of different formats but today there are only a few (epub, SRT/WebVTT, txt, csv), hopefully clearly laid out in the docs.

 

In terms of the 8000 words that you already know, how you can quickly get that into the system is in the "getting started for advanced", which is obviously an epic fail on my part! Basically, if you have all of those in list format (like a CSV export from Anki, etc.) you can "train" the system in a few minutes by importing the list. There are links to all these sorts of configuration tasks here: https://transcrob.es/page/software/configure/home/

 

It really looks like I'm going to have to rewrite those getting started guides - most of the rest of the docs are much more direct and to the point, and should be easy to navigate and understand (otherwise it's a bug). 

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...