Jump to content
Chinese-Forums
  • Sign Up

Transcribing Mandarin as a learning method


Flickserve

Recommended Posts

15 hours ago, Publius said:

I did it again in 2007-08, this time on a language-learning website built around the idea of "crowdsourced transcription."

Which website is that?

I am very interested in taking a look at this.

If you can remember the name of it, I may be able to check it out using the Wayback Machine

even though it may no longer be in service.

Link to comment
Share on other sites

It's http://forum.putclub.com/ (A Chinese site. Must register to read -- it's to cope with censors shutting down sites based on undesirable content found through search engines)

The site is dying, if not already dead.

It works (or worked) like this: Moderators prepare the material, often without an official transcript/subtitle, e.g. NPR Hourly News Summary. Members submit their transcriptions as "homework" or as a revision based on the last person's revision, with the differences marked in red and the part they are unsure of marked in blue. Later on the moderators produce a 整理稿 as a kind of answer key based on the collective efforts of the community. But starting from I don't know when, people stopped participating. They just paste their homework, or worse, copy-paste the answer key and call it a day. The quality of 整理稿 went into a downward spiral. It also didn't help that the site owner seemed insouciant about technology advances, especially the advent of smartphones.

I know some of our forum members are English teachers. If you think transcribing is a good idea and you want to point your students to somewhere so they can practice it, I recommend http://www.hujiang.com/. It has a module called 聽力酷.

Link to comment
Share on other sites

Would there be any possibility to replicate such a tool on chinese-forums? We could use publicly available videos (Youtube...) and sound files (Ximalaya, podcasts...), use a file sharing platform (Drive?) to share documents and try to collectively produce good transcripts for learners of all levels.

Would there be copyright issues?

 

 

  • Like 1
Link to comment
Share on other sites

7 hours ago, laurenth said:

Would there be any possibility to replicate such a tool on chinese-forums? We could use publicly available videos (Youtube...) and sound files (Ximalaya, podcasts...), use a file sharing platform (Drive?) to share documents and try to collectively produce good transcripts for learners of all levels.

Would there be copyright issues?

 

Interesting idea, although Zoho or Microsoft Office online would be better than Google Docs if you wanted people in mainland China to participate...

 

Regarding the legality question.

Link to comment
Share on other sites

Well, there's also the problem of YouTube not being accessible in the Mainland, while mainland video services work very poorly outside the mainland.

 

I say: Get the low-hanging fruit first. Take advantage of audio and video materials already accessible to you with full script/soft subtitles instead of chasing difficult-to-find stuff.

 

For audio, your first stop could be the Slow Chinese podcast together with WorkAudioBook player, and graded readers with mp3.

 

For video, you can browse FluentU or Amara.org for videos already with soft Chinese subtitles. You can also contribute your own subtitles to Amara.org, and other volunteers can revise them.

 

I'm also trying to compile a list of YouTube videos with soft subtitles here: https://www.chinese-forums.com/forums/topic/53829-list-of-tv-shows-on-youtube-with-chinese-srt-subtitles/#comment-412770

 

In that last thread, someone recommended Viki. I see you can even signup as a volunteer to help with subtitling/translation, and this could also be a good exercise:

https://www.viki.com/community

  • Like 2
Link to comment
Share on other sites

  • 4 months later...

I've been inspired by this thread to start transcribing episodes of 《锵锵三人行》.  I found a website that lets you slow down Youtube videos and replay segments of them in a loop.

 

Transcription has a few interesting performance statistics that can be tracked over time:

 

time efficiency

 

gif.latex?%5Ctextrm%7Btime%20efficiency%7D%20%3D%20%5Cfrac%7B%5Ctextrm%7Btime%20length%20of%20transcribed%20audio%7D%7D%7B%5Ctextrm%7Btime%20it%20took%20to%20transcribe%20that%20audio%7D%7D

 

Like imron suggested above, one performance statistic you could measure is how long the audio is versus how long it took you to transcribe that audio. If you don't need to slow down, pause, or replay segments of the audio, then your "time efficiency" would be 100%. If it took you 30 minutes to transcribe 15 minutes of audio, then your time efficiency would be 50%.

 

comprehension

 

gif.latex?%5Ctextrm%7Bcomprehension%7D%20%3D%20%5Cfrac%7B%5Ctextrm%7Bnumber%20of%20comprehensible%20syllables%7D%7D%7B%5Ctextrm%7Ball%20transcribed%20syllables%7D%7D

 

When transcribing audio, I'll sometimes encounter patches where I don't understand anything being said. When this happens, I use the following syntax in my transcription:

Quote

?[number of incomprehensible syllables]?

 

"all transcribed syllables" in the formula above is a sum that includes the total number of characters and the total number of incomprehensible syllables. 

 

accuracy

 

gif.latex?%5Ctextrm%7Baccuracy%7D%20%3D%20%5Cfrac%7B%5Ctextrm%7Bnumber%20of%20correctly%20transcribed%20characters%7D%7D%7B%5Ctextrm%7Btotal%20number%20of%20transcribed%20characters%7D%7D

 

If you hear the audio and mistakenly think they said "A", when the really said "B", then that error would decrease your "accuracy" performance statistic. For shows like 《锵锵三人行》 that have official transcripts, it would be an interesting programming exercise to automate the calculation of this statistic.

Link to comment
Share on other sites

  • 10 months later...

@大块头, based on this comment from another thread, you mentioned you've been transcribing now for several months, what do you feel your progress has been like?

 

What statistics have you been tracking and what if any improvement do you see - both from a data perspective and from your own general impression when encountering new material?

Link to comment
Share on other sites

3 hours ago, imron said:

@大块头, based on this comment from another thread, you mentioned you've been transcribing now for several months, what do you feel your progress has been like?

 

What statistics have you been tracking and what if any improvement do you see - both from a data perspective and from your own general impression when encountering new material?

 

I stated earlier in this thread my intention to transcribe audio from《锵锵三人行》, but I decided to transcribe HSK audio instead because my current overall goal is to improve my HSK 6 test score.


After a few false starts, I started spending approximately 100 minutes a week transcribing audio from HSK practice tests, specifically the audio from BLCU's 精讲精练HSK六级 book. To get the most bang for my buck, I've been taking the practice tests and then transcribing the audio files a few months later.

 

For each ~50 minute audio file, I create one text file with entries for each day of practice. Here is my entry for today:

Quote

date - 2018/07/04
segment start - 24:48
segment end - 26:05
transcription time - 24:14

 

私立大学则不能。不过,耶鲁大学在排名上也有吃亏的时候。比如,英国金融时报和中国上海交通大学都推出过世界大学排行榜。耶鲁的排名均比较靠后。这两个排行榜过分强调大学在学术期刊上发表论文的数量,而不重视人文学科和职业学院。所以,世界上没有十全十美的排名。

 

耶鲁大学与中国的?2?在美国所有大学中最为久远。早在一八五四年,清朝人容宏就从耶鲁大学毕业。成为获得美国大学学位的第一位中国人。一八八一年,中国铁路?2?詹天佑也成为耶鲁大学的毕业生。一九零一年,耶鲁大学还组建了雅礼协会,专门从事于有关中国的教育事业。另外我知道,耶鲁曾两次为中国十四所顶尖大学的校长提供培训。中美国情衡不相同,您认为这些培训有用吗?

 

当然。这些培训非常有价值。以大学校长培训项目为例,中国的大学校长在耶鲁培训期间我们向他们介绍了耶鲁的一些做法。

 

I've been using the excellent transcription audio player Parlatype (Linux only). If needed, I use Pleco or the internet to look up words I don't understand (such as proper nouns in the entry above).

 

By the end of this year I hope to put together some Python code that takes my transcription files (as well as the official audio transcription) as input and performs statistical analysis regarding improvement in my listening ability over time. Specifically, this code will analyze the following statistics: transcription time efficiency, comprehension, and accuracy (see my post above for definitions of these statistics). At this point in time my general impression is that transcribing these audio files has gotten easier and easier.

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...