WorkAudioBook – a tool for listening practice (and subtitle creation)

June 6, 2014 at 07:45 PM

When learning Chinese, one of the most useful things you can do to improve your listening ability is drilling sentences. This means you repeat the same sentences, spoken by native speakers, over and over. Your goal is to understand every word that is spoken in the sentence. Typically, one does this by finding an audio file with the text spoken (e.g. an audiobook, or podcast) then find the transcript. Typically, you would load the audio file into an application such as Audacity, select each sentence manually, match it up manually with the transcript and repeat it.

While this is OK, it requires a lot of mousing around to select audio fragments and match them up with the text. And if you want to go back to an earlier sentence and drill it some more, you need to find it again and re-select it again. And then if you would like to export these sentences to another tool (e.g. Anki) with the audio, it requires selecting, then extracting the files for each segment of text, and copy pasting the sentences into your tool. It’s pretty slow going and you spend a lot of time not listening to Chinese but playing with tools.

Recently I found the application “WorkAudioBook” at http://workaudiobook.com/ by a developer called Sergey Povalyaev which is designed as an audio player for language learners. I was pretty excited to find it because it makes it very easy to do listening practice, and can additionally be used as a tool to create subtitles that match the timing of audio very easily. This application is free for PC (and I think Android but I don’t have an Android device so can’t confirm). I have no relationship with the developer but I am very impressed by him!

WorkAudioBook will load an MP3 and automatically segment it up into sentences based on short silences that occur between sentences. You can then load a text document with the transcript and simply highlight the text that corresponds to each spoken sentence, and press a button. WorkAudioBook will then mark that timing of the audio sentence with that text (recording the start and end time), just like subtitles on a movie (if you have ever looked at an SRT subtitle file it’s just start timing plus end timing + text).

So, first time round you can listen to a sentence, mark the text that corresponds to that sentence. Go through the audio until you’ve matched up and studied a bunch of sentences.

The after that, you can drill yourself by listening to the sentences. If you think you understand the sentence, you can check your answer by revealing the “subtitles” that correspond to that sentence. You can mark sentences as easy, medium and hard depending on how easy you find it to listen and understand. New vocabulary can be marked and exported to Anki.

Even more interesting for me, if you go through a whole audio file and mark all the sentences, you can export all your timings as an SRT file.

An Example Walkthrough

Here’s an example. Let’s consider how to use this tool to practice listening by using an example passage from Slow Chinese. There is very good documentation for WorkAudioBook on the website (http://workaudiobook.com/) so I’ll just explain the steps and you can check the website for details.

First download the WorkAudioBook application on PC. Let's try the very first podcast from Slow Chinese series, about the Dragonboat festival:

http://www.slow-chinese.com/podcast/1-duan-wu-jie/

Download the MP3 file and load it into WorkAudioBook. Open the subtitle editor and “edit” it. Insert the title “端午节” and then transcript below which starts with “中国农历的五月五日是一个重要的节日”… Turn off edit mode so you can start marking the sentences.

Press space to play the first segment of sound. It’s into music so we want to skip through for a bit until we get to the good stuff. Press the fast forward button (Alt-Right Arrow key also works) a few times until you get to the first sentence.

The first sentence is端午节. Because of the intro music it doesn’t quite detect this sentence starting neatly at 25-27 second so you might want to just select it yourself (doesn’t really matter it’s just a bit neater). You should stop the auto playback (press stop button). Then select 端午节 in the subtitles text, and press “N” (or press the Play Next button). This will mark the audio sequence for 端午节 as the 25-27 seconds mark.

Then the next segment audio selected is “中国农历的五月五日” so select that text, press stop, and press N (Play Next) again. The next segment is “是一个重要的节日，叫做端午节”, stop and press N.

Continue like this – trim or expand the audio a little if it doesn’t quite detect the sentences well or if you would prefer longer/shorter audio fragments. It’s often useful to use the next red line in the audio as the likely next best stopping point. If you make a mistake just delete the subtitle using the Del button (this won’t delete the text, just the marking of start and end timings). There are handy shortcuts for advancing through the text to the next punctuation mark (press H).

Once finished marking all the text (takes a few minutes) you are ready to drill. Go back to the start and you can either walk through the audio (fast forward, reverse) or you can select sentences in the text and it will jump to the right audio for that sentence. Drill away – try to understand the sentences, look up words you don’t know, even try shadowing the sentence by saying it exactly as said.

I’ve also used this tool to transcribe text (if I don’t have a transcript, I make one) and I’ve used it to match up transcripts with TV show dialogs. Notes that for TV/Movies it’s a bit harder as the sentence endings are harder for the software to find as there is a lot more background noise.

There isn’t currently a record feature (would be even better for shadowing), although the developer says he’s considering developing one.

Advanced: Make an SRT file and use Subs2SRS to make Anki sentence cards

Now, what I like to do with sentence is put them into Anki, and “cloze” words. So I get the audio + hanzi into a card, and then I mark particular words I want to learn, and drill them in SRS (this is often called MCD – massive/micro cloze deletion). Actually what I really really like is having a sentence “bank” of hundreds (indeed, thousands) of cards that are ready made, and then select which cards I want to learn next based on what vocabulary I am prioritizing.

So this tool is really useful for this purpose because I can take an audio file, mark the sentences according to the transcript, the export a SRT (subtitle) file. To do this press the Import button (I know, a bit strange to press import in order to export but that’s how it’s done). After exporting an SRT I can load both the MP3 and the SRT into Subs2SRS (you just need to tell Subs2SRS to look for All Files to find an MP3 file as it’s usually looking for a video file type but it’s perfectly happy with MP3 once you select it).

Turn off the “video snapshots” option and you are ready to make Anki cards (check out the Subs2SRS documentation for details). Using this I made 100 Anki cards for Glossika’s Business Chinese audio files in about 15 minutes. I plan to make all the rest over the weekend. If you have English translations you could also match up the English with the sentences and make a second SRT file, then put that into Subs2SRS to make bilingual cards. To get pinyin I think it’s easier to use one of the Chinese plugins for anki that auto generates Pinyin.

Summary

In summary, WorkAudioBook is a really cool tool for drilling audio. It probably works best for audio books (you need the audio + the full book) or podcasts (you need the MP3 and the transcript). Even if you don’t have a transcript you could just use it to repeat sentences from any source with sentence breaks. The developer really seems to be focused on learning English via audio books but it works perfectly fine with Chinese text too.

You can also use it for movies/TV (you’d have to strip out the MP3 audio from the file) but the sentence detection might not be optimal given background noise, so it might take a bit longer. If you already have an SRT file that matches the audio it could be super quick (but in my experience it’s hard to get a good match, so you might need to mess around with timings using other tools).

For your pleasure, I've attached the SRT file for the Slow Chinese article (I wanted to attached the Anki file but internet connection is not cooperating today so I'll leave it as an exercise for the reader).

Hope you find this information useful, happy studying!

Slow_Chinese_1.srt

June 6, 2014 at 11:13 PM

This sounds awesome, I've got spare time in the middle of the day and always doing the Anki cards tends to be rather dull. Something like this to add to the mix would be wonderful.

June 7, 2014 at 01:53 PM

@hedwards - hope you find it helpful.

I find you can tackle two birds with one stone - listen through an MP3 and study against the transcript, and mark it up with timing information very easily so you can turn sentences into cards quickly afterwards.

June 7, 2014 at 03:06 PM

A big thank you for sharing this - I've been looking for a program like WorkAudioBook for a long time. I haven't tried it yet, but from your (very thorough) description it looks like a huge time saver.

June 7, 2014 at 03:59 PM

Dropped the developer a note to point him towards this topic - he sounds happy to hear people are using it for other languages and says he'll drop in and say hello once he's back from holiday at the end of the month.

June 7, 2014 at 05:24 PM

I think the main challenge I'm having is that the subtitles on the video I'm using aren't well synced, and editing in the program is a bit of a challenge. Hopefully he'll be willing to display the time code for the selection as it's rather challenging right now to edit the subtitles if they don't match up with the selections that the program is making. And better yet, let people use the subtitle time codes for the default phrase lengths.

I do see a ton of potential here though.

June 7, 2014 at 07:27 PM

If you already have subtitles and just want to make Anki cards with Subs2SRS, it might be a matter of adjusting the timing. Usually the problems are an offset (e.g. the film starts 6 seconds later than the subtitles thinks it does) and PAL/NSTC framerate differences (which lead to a 4% different runtime for the film so everything gets progressively more out of sync). If those are your problems, better to fix the subtitles first. Let me know and I'll dig out the names of the tools that can do this.

June 7, 2014 at 08:26 PM

Tysond, I'll take another look. The subtitles aren't from the original source, they're custom, I'm not really sure if they're based on a PAL or NTSC source.

I'm hoping that you're right, the part of the DVD i'm looking at has some fairly long pauses and it's hard to say if the subtitles are imprecise or if there's something else going on.

June 8, 2014 at 01:20 PM

Would be good to know what tools you are using to modify the SRT files based on the criteria you mentioned, so we can get the timings right.

June 9, 2014 at 05:47 AM

One "helpful" button isn't enough for your review and instructions!

June 9, 2014 at 10:58 AM

Ok for subtitles here's my thinking and toolset. You have a subtitle file like SRT file. Generally this is from a film or TV show. And you have audio, an MP3 file.

The Simple Way - Use Subtitles as a Transcript

First, you can just import it as a text file into WorkAudioBook. This will *ignore* the subtitle timings and will just present the text to you (the timings are there in the text but they are not used, they are just text). So you are really just using the SRT as a transcript You can select the spoken dialog in the text and match it up manually using the autodetected phrase lengths.

This may or may not match the subtitle phrase lengths, which have been decided by a human to best match the video. (note that you can play with the settings in WorkAudioBook to tune the lengths of the phrases and stuff). You'd probably have to do a bit of selection to match.

Actually you don't have to do *all* the subtitles, you could just do the ones you are interested in, or the first 5 minutes, or whatever, and you can just ignore the rest. For listening/shadowing practice this is probably a fine technique as you are going to spend a lot of time listening and shadowing and not much time matching up the audio.

The "Fast" way, use Subtitle Timings

Second, you can import it into WorkAudioBook using the import button, and import the SRT. This will *use* the subtitle timings as the basis for matching up the text with the audio. If the timings match up the audio, great, you can listen to them easily.

What if the timing isn't right? Now the fast way is not so fast...

Frequently there are mistimings, of two sorts. One is a constant offset (audio starts earlier or later than the subtitles). Two is PAL/NSTC mismatch for film audio -- because actually these video formats play films at different rates (runtime of a film can be different by even as much as 10 minutes!). This happens to me all the time.

So I use a tool called Aegisub. http://www.aegisub.org/ which can fix both of these.

First, I figure out what the difference is. Use the first and last subtitles - write down the start time of the audio (e.g. from video/mp3) and the subtitles (the time is right there in the SRT file). Ignore any extra comments at the start or end (sometimes they add extra credits or translate some of the end credits). Just use the first and final line of the dialog.

The difference between the first line of the audio and the first line of the subtitles is the constant offset.

The difference between the (first line of audio - last line of audio) and (first line of subtitles - last line of subtitles) will tell you if you have a framerate difference (e.g. a percentage difference that runs throughout the file). It's probably 4%.

Open the SRT file in Aegisub. Select Timing, Shift Times... and then enter the constant offset to move the subtitles start point to the correct time.

Then select File, Export Subtitles, and click on "Transform Framerate". The Input Framerate will be 23.97 frames, and then select Constant Framerate and enter 25 frames. This will speed up by 4%. Or click "Reverse Transformation" to slow down by 4%.

Check the final subtitle timing to see that it matches what you wrote down (maybe you sped up instead of slowed down for example). You can also check a few points in between to make sure they are nice. Export your subtitles as a new SRT file and you are ready to use WorkAudioBook or can also use Subs2SRS to make flashcards.

What if it's not a 4% difference?

Finally, sometimes the subtitles are not just 4% different. Basically, you are in trouble here because it means the film itself has been changed (edited, censored, directors cut, local version, whatever) versus wherever the subtitles came from. You can still use the first Easy method above to use the subtitles as a transcript, although you may find some scenes are missing/extra and you can't easily use them (or you have to add them yourself).

What if the subtitles are timed almost right but a bit short?

You can use Aegisub's Timing, Timing Post Processor to add lead-in, lead out to each subtitle to make it a bit longer. But you may be get some overlap. Subs2SRS also lets you add buffers to the audio.

If you want true perfection, load it into WorkAudioBook and adjust every subtitle to your liking by hand... but see my warning below.

Knowing When to Give Up

That's about all I've learned on this topic. Sometimes it's a five minute job, sometimes thirty minutes to get a movie all matched up.

Sometimes an hour in I just give up on a particular movie and move on.

WorkAudioBook gives more options to get it right if you put the time in, because you can listen and adjust the subtitles to your liking. I think this is worthwhile if you are listening to every sentence while you are manually processing them because you are getting lots of listening practice anyway.

But you can find yourself messing with the technology longer than you spend on learning Chinese. So be mindful of your time.

[edited for mistake in timing calculation]

June 9, 2014 at 03:30 PM

Thanks Tyson, I'll try that. Most likely I'll just go scene by scene through the movie, assuming that it doesn't turn out to be a constant amount or that 4% you reference.

And now that I think about it, I think the DVD did have 2 different running lengths. I'll look into that, but I think one was off by about 10 minutes from the other without any obvious reason. I wonder if just switching to the other option would solve most of these problems.

Either way I'll lett you know. This tool looks very useful, but mostly if I can find adequate materials to run through it.

June 11, 2014 at 04:10 PM

I wound up buying the Glossika sentences and they seem to work quite well with the WorkAudioBook, it took a little bit of adjustment to get the phrasing correct, but it's going to be really convenient for when I do my transcriptions.

June 20, 2014 at 07:41 AM

Many thanks tysond for alerting us all to WorkAudioBook.

I was created key vocab and key sentence Anki notes manually using Audacity. By combining the tracks for the last six lessons of my current university course into one audio file with Audacity, I was then able to use WorkAudioBook to produce all the Anki notes for my last six lessons in one go, taking only about a quarter of the time it would normally take.

I have posted a review of WorkAudioBook, including a link to this thread, on the internal closed forum for my Open University Beginners Course. I have reproduced my review in slightly edited form below in case it also helps any Chinese Forums users who are beginners, like me:

A new tool for listening to and repeating audio phrases, optionally alongside transcripts

Listening to an individual audio track on CDs, MP3 players or on a PC if fine but I've found this difficult because:

the audio is usually too quick for me to follow, especially when I first study a Lesson,
there is not enough of a pause on the track, in between phrases, for me repeat the phrase to myself
it is not easy to break the audio down into short phrases and repeatedly practice (listen and repeat) an individual phrase
going back and repeating short phrases is difficult on a CD, or even when playing MP3s on an MP3 player or PC.

Has anyone else had the same difficulties?

Thanks to this thread on Chinese-Forums, I was alerted to a free tool, WorkAudioBook, which can help with the problems above. WorkAudioBook will:

automatically segment an audio file into short phrases
makes it easy to repeatedly practice a phrase
allow you to link each audio segment to the corresponding segment of text in a transcript file.

WorkBookAudio is very easy to use for just listening and repeating phrases, to improve fluency in speaking.

Optionally, you can link each audio segment to the corresponding segment of text in a transcript file. I've tried to write a step-by-step guide:

In WorkBookAudio, click the Open button and then select an audio file from one of the CDs
Open the transcript for the audio file in any text editor and copy the required transcript to the clipboard
In WorkBookAudio, select the SubTitles Editor tab, if not already selected, and then click Edit to select edit mode,
then paste the transcript text into the large blank SubTitlesEditor window
and finally click the Edit button to return to normal mode
Now you can step through the audio using the large 'radio' buttons at the bottom of the screen or press the 'N' key
and play individual phrases, again using one of the 'radio' buttons or by pressing the '[sPACE]' key
Having selected an audio segment, you can link it to the corresponding segment of text in the transcript by using the mouse cursor to select the text and then clicking the +Add button
Once you have finished linking each audio segment to its corresponding text segment, click the Import button on the left and then click the SRT File button at the bottom left.
This will save your transcript and timing links to the audio file in the same folder as the audio file, as an SRT file.
Now, every time you open the audio file in WorkBookAudio, it will be linked to the transcript.
You can step through your selected phrases using the the methods above or by using the blue radio buttons in the Subtitles Editor tab's tool bar
You turn the display of the transcript on or off easily by checking the Show Always checkbox at the right, or just reveal it one phrase at a time after you have listened to the phrase by pressing the s key.

July 2, 2014 at 04:48 PM

Hello everyone!

I am the developer of this tool, and roddy invited me to join this discussion.

First, big thanks to tysond for this great explanation of how to work with WAB and creating this topic! And thanks philwhite for sharing the info.

tysond made a great work of describing how to use the app, so I have just a few additionals notes:

-- Pay attention, that once you have subtitles (you downloaded them or created yourself) -- you can hide them. Sounds illogical? But from my experience of learning English it is useful to listen without seeing text, and only when you are lost, click to peer into it.

-- If you have subtitles that has approximately right time, but imperfect, you may prefer to choose "Subtitles don't affect phrase selection" option in Settings, Subtitles tab.

-- Subtitles Editor has "Shift Time" button on its toolbar to shift timing of current and all subs after it. For more editing options use Aegisub, as tysond recommended.

-- There are some basic functionality for repetition inside the WAB -- see "Bookmarks and Grades" section of User Guide: http://www.povalyaev.com/WorkAudioBook/How-to-use--Bookmarks(Windows).aspx

For more options, I think the Anki is better.

Feel free to ask questions. I am glad that more people use the app to learn more languages!

July 3, 2014 at 06:01 AM

Welcome! Awesome tool you've made - I've only just started using it, but I'm finding it really useful already. Thanks also to you, Tysond, for telling us all about it!

July 3, 2014 at 11:22 AM

Sergey, great to see you joining in! Must be fun to see so many people using the app...

July 5, 2014 at 07:42 AM

I see that you are creating the subs for MP3s. If some of these MP3s are without copyright / in public domain (may be audiobooks from LibriVox or some other source) -- I will be grateful if you'll share the link to the source of these MP3s + subs that you've made. So I would put your work to my web site as an example of one more language that you can learn using WAB. Also your fellow forum members will say you Thanks!

P.S. Thanks tysond for sharing subs for podcast about Dragonboat festival from "Slow Chinese" from -- I see that according to http://www.slow-chinese.com/about/ it is free to distribute for non-commerical purpose.

July 5, 2014 at 03:50 PM

Regarding Slow Chinese, they do say NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

I don't know if subtitles count as a derivative work. I sent an email to the Slow Chinese team to ask their advice.

July 6, 2014 at 07:26 PM

Hi Sergey,

Many thanks for developing and sharing this awesome tool. I'm only just beginning to appreciate how useful the tags are for skipping over the unmarked and easier phrases, to concentrate on those phrases which I still find difficult to pronounce.

I'm sure you have your own plans and list of improvement you intend to make. I am only a beginner here, so others would be in a better position to comment on how useful and feasible this would be but, If I could make one suggestion,

I suspect WorkAudioBook would be an even more awesome tool for improving Chinese pronuciation if it were feasible to link it to Praat to display the pitch contour of the currently selected phrase in WAB and the pitch contour of the user's most recent voice input from a microphone.

(Praat is GPL'd and seems to be umanaged C++, whilst WAB seems to be written in managed C++ using .NET WinForms, so I guess you'd need to write a managed wrapper for the unmanaged Praat libraries. Alternatively, you might use Praat scripting, as does SpeakGoodChinese (a GTK application, not WinForms) to draw pitch contours from microphone input.)

Sign In

WorkAudioBook – a tool for listening practice (and subtitle creation)

Recommended Posts

tysond

hedwards

tysond

etm001

roddy

hedwards

tysond

hedwards

Touchstone57

Ruben von Zwack

tysond

hedwards

hedwards

philwhite

SergeyP

Yadang

roddy

SergeyP

tysond

philwhite

Join the conversation