Jump to content
Chinese-Forums
  • Sign Up

Transcription Project


Recommended Posts

Posted
12 hours ago, markhavemann said:

I find it a constant frustration that there are basically no transcriptions available for Chinese TV series, so I have decided to start my own transcription project.

 

But what do you use those transcriptions for?

Posted

Here is a folder on Google Drive with full statistics for 家有儿女 S01 E01-10 along with the transcripts that vellocet provided. There maybe be a few mistakes in the statistics because of the way that I'm breaking up words but as I go through it myself this should be corrected and it's probably at least 90% accurate right now anyway.

 

https://drive.google.com/drive/folders/1w2BPsbMmuruTONmr4xy6s1CwG7IJ5CTd?usp=sharing

  • Like 3
  • Helpful 1
Posted

If I wanted to fund a show of my own choosing, would you be open to telling me how to reach out to this service. I like your strategy! :)

Posted

Of course. It's a store on Taobao:

 

【靡尚网拍摄影旗舰店】,復·制这段描述¥yqGDbLNbAjz¥后咑閞手机淘宝或者用浏览器咑閞https://m.tb.cn/h.3qtPhk7?sm=9ce409查看

 

(In case you aren't sure: just copy that link on your phone and open Taobao, it should detect it and ask if you want to visit the store.)

 

But just to explain the process: 

 

Because (I imagine) actually transcribing from the episode would be much more expensive since, even with subtitles, it's a lot more labour intensive with playing and pausing the video all the time, I extract the subtitles as images and make them into a PDF document which I send for them to "录入". Basically the process is: 

 

1. VideoSubFinder to extract the subs as images. https://sourceforge.net/projects/videosubfinder/

2. Powershell script and ImageMagick to crop out the white space. 

3. Python script to join them up into A4 sized images. 

4. Adobe Acrobat to turn the images into a PDF file. 

 

PM me and I'll send you the scripts. 

 

I realise that not everybody may be inclined to go through all of that so I'd be happy to do it for you if you tell me what series you want. I've got the process kind of streamlined but I want to automate it even further.

 

Also, if you are not based in China or can't use Taobao for whatever reason I'd be happy to facilitate that too if you reimburse me over WeChat/Paypal. 

 

Since I would like transcripts for almost any show, I'm happy to put in a bit of work to make it happen. 

 

 

 

  • Like 4
Posted

@markhavemann

 

I had a quick look at the excel file on frequencies of characters. Thanks for that. It was pretty interesting to note that I know most of the words. However, when it comes to actually listening, I can’t really understand very much due to speed of delivery and unfamiliarity.

 

Another series with partial subs on this forum is 奋斗. From memory, it goes up to about the 5th episode. 

 

 

Are you based in China?

Posted

Yes I am based in China. 

 

Thanks for pointing out the 奋斗 transcripts, I've found a few which I'll add to the project. 

 

I'm glad you found the word frequencies interesting. I've also found that "knowing" words and actually following them in sentences can be two completely different things. Recently I've tried to make my studying a little more sentence focused. When I find a sentence that I know all the words in but actually don't understand in context, I try to extract a clip of it which I put on my phone. Now I have a little library of sentences that I shuffle through whenever I'm on the bus or have some time to kill. (I'm trying to automate the extraction process too and having transcripts should make that a little easier.) I've found this really helps and also gives me some patterns to call on when I'm talking myself, I think that Glossika may be based on this principle but I much prefer to have sentences that I've chosen for myself and find interesting. 

  • Like 3
Posted
2 hours ago, markhavemann said:

When I find a sentence that I know all the words in but actually don't understand in context, I try to extract a clip of it which I put on my phone. Now I have a little library of sentences that I shuffle through whenever I'm on the bus or have some time to kill. (I'm trying to automate the extraction process too and having transcripts should make that a little easier.) I've found this really helps and also gives me some patterns to call on when I'm talking myself, I think that Glossika may be based on this principle but I much prefer to have sentences that I've chosen for myself and find interesting. 

 

You can do this with anki. I set up cards for listening (it repeats the sentence about ten times), shows me the answer and then repeats another 5 times so that I can try to shadow. Just copy the cards you want with the sentence into another anki deck which can go into your phone. Anki will do your shuffling for you. If you use iphone, anki app costs money. Instead, there is a free android version which works quite well if you have android or an old android phone around.

  • Like 1
Posted

Excited to see a project like this underway! Incidentally I ran the transcript for 奋斗 through CTA the other day and thought to myself "I wish there were more subtitles and transcripts available". 

While studying Japanese, having subtitle files helped me break into watching Japanese tv more comfortably so I'm eager to try that with Chinese

  • Like 2
Posted
On 12/29/2018 at 1:33 AM, Flickserve said:

You can do this with anki. I set up cards for listening (it repeats the sentence about ten times), shows me the answer and then repeats another 5 times so that I can try to shadow. Just copy the cards you want with the sentence into another anki deck which can go into your phone. Anki will do your shuffling for you. If you use iphone, anki app costs money. Instead, there is a free android version which works quite well if you have android or an old android phone around.

 

Unfortunately I have an iPhone but I bought Anki a while back since I use it almost everyday. I like your idea of having the sentence repeating a bunch of times and of using Anki to do it in a structure way and I may start doing that too. 

 

The reason that I put them into the music player is so that I can use it as passive listening practice. I found that I wanted something to listen to on the bus/subway or while exercises or doing the dishes/laundry, whatever. At first I tried radio and things like extracted audio from TV shows but if I got distracted or there were words that I didn't know then it was just noise in my ears and not really learning. I decided on sentences that I'm already familiar with but are a little challenging. This way it doesn't matter if I'm not listening all the time and I (should) know all the words in the all sentences, or at least I could probably remember the source of the sentences and then remember the words if I've forgotten them.

 

This way I really only need at most a 15 second stretch of concentration to get a little practice in and it doesn't matter if something else interrupts me or distracts me because it's easy to get back on track when the next sentence starts. 

  • Like 2
Posted

北京爱情故事

E01

 

Has been added to the project. 

  • Like 1
  • 2 weeks later...
Posted

爱情公寓

S01E01

 

Has been added to the project. 

  • Like 1
Posted

鬼吹灯之精绝古城

E01

 

 

Has been added to the project. 

  • Like 1
Posted

鬼吹灯之精绝古城

E02

 

Has been added to the project. 

Posted

鬼吹灯之精绝古城

E02

 

Has been added to the project. 

Posted

Thanks to a post that I found by Roddy, I've gotten hold of what looks like most or all of 爱情公寓 season 1 and 2 transcripts. Once I've had a chance to go through them and process out all the extra junk I'll add them to the drive. If you want them sooner as they are, please let me know! 

  • Like 2
Posted

红楼梦 (1987)

Full Series (36 episodes)

 

三国 (2010)

E01

 

Added to the project.

 

---------

Not sure if I will get a chance to do an episode of 鬼吹灯之精绝古城 today. Will do my best. 

  • Like 4

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...