glu Posted October 15, 2021 at 03:32 PM Report Posted October 15, 2021 at 03:32 PM Hi All! Long-time lurker and occasional poster here. Over the past 2+ years I have built up a collection of tools that I am using on a daily basis in my own Chinese learning practice. When I saw the recent topic about transcribing audio with online services I realized it might be time to share some of the things I have been working on. By "opinionated" I mean some guiding principles that I feel strongly about, and which seem to work well for me. These include de-emphasizing Chinese characters and focusing on building vocab first instead of spending excessive effort on the script, and working with authentic, real-life content even if it's still a bit above my level. I see there are topics about existing tools, including commercial ones, on here. Is a topic like this an acceptable way to share the things I have been building? I am unsure where these are going; at this stage they are not even remotely close to anything resembling a "product." I essentially want to understand if others see value in them, get feedback from a broader group of fellow learners and teachers, and see if it makes sense to invest more effort and make them usable for others besides me ~Gábor PS: There's a lot of lively discussion in this topic, so I'm adding links here to the posts that are about the tools themselves. (1) Interactive audio player for close listening (2) Original content, spontaneous speech + Achieving comprehension through close listening (3) Preparing new audio, Part 1 * Part 2 (4) Integrated dictionary 2 Quote
Moshen Posted October 15, 2021 at 04:46 PM Report Posted October 15, 2021 at 04:46 PM Quote and working with authentic, real-life content even if it's still a bit above my level. Could you say more about what this involves and how that works for you? I'm more interested in your approach than in any "tools." Quote
roddy Posted October 19, 2021 at 04:05 PM Report Posted October 19, 2021 at 04:05 PM On 10/15/2021 at 4:32 PM, glu said: Is a topic like this an acceptable way to share the things I have been building? Post away! 1 Quote
glu Posted October 19, 2021 at 04:34 PM Author Report Posted October 19, 2021 at 04:34 PM On 10/19/2021 at 6:05 PM, roddy said: Post away! Thank you, will do! Quote
glu Posted October 20, 2021 at 03:42 PM Author Report Posted October 20, 2021 at 03:42 PM Interactive audio player for close listening As promised, here is the first tool I'd like to share with you. Without further ado, you can just go and try it here with a sample episode: https://jealousmarkup.xyz/off/tingshuo/index.html?ep=WY53 I'll share a bit of the thinking behind it, and how I developed the software and the content, in separate posts. A few quick things up front: The audio is episode 53 of The Unemployable (无业游民) podcast: https://theue.firstory.io/ The dictionary annotations are from CC-CEDICT You need a full screen to use the player; this form is not mobile-optimized Don't hesitate to throw any first impressions or questions at me! I see there is already a lively discussion here with strong feelings about the best way to approach Chinese. Let's keep up this vibrant and friendly exchange and learn from each other. ~Gábor 1 Quote
abcdefg Posted October 20, 2021 at 04:58 PM Report Posted October 20, 2021 at 04:58 PM On 10/20/2021 at 10:42 AM, glu said: Don't hesitate to throw any first impressions or questions at me! The reader has a natural-conversation style, like one hears in the real world, on the street. Learners can benefit from paying attention to her phrasing and also to the small "non-words" that she vocalizes during transitions. These are things that some instructional podcasts and videos omit, perhaps because they are striving for "academic perfection." Why the eerie electronic "space-voyage" music? Doesn't contribute anything in my opinion. Kind of a strange accompaniment to the topic being discussed. Is that something you added? Don't know what to say about the player per se. Seems to me the same content is already abundant elsewhere. I must have missed the whole point. Why not just watch something on YouTube or listen to it on one of a zillion podcasts? Maybe what you have done is add the transcription along with Pinyin. If so, that is helpful. Thank you for that. Quote
glu Posted October 20, 2021 at 06:59 PM Author Report Posted October 20, 2021 at 06:59 PM Thanks, these are great questions! I'll unpack things a little by responding to what you wrote. On 10/20/2021 at 6:58 PM, abcdefg said: Why the eerie electronic "space-voyage" music? [...] Is that something you added? Seems to me the same content is already abundant elsewhere. [...] Why not just watch something on YouTube or listen to it on one of a zillion podcasts? Original content, spontaneous speech The player that I linked to is the last step in a process that I use to make original audio content accessible (to myself). The starting point is exactly what you write: a podcast, or maybe a video on Youtube. The text you hear in the player was not written with an educational purpose in mind. It is something I grabbed while looking for authentic audio to listen to. I.e., you need to ask the podcast's producers about the background music : ) The key thing is exactly what you write here: On 10/20/2021 at 6:58 PM, abcdefg said: The reader has a natural-conversation style, like one hears in the real world, on the street. Learners can benefit from paying attention to her phrasing and also to the small "non-words" that she vocalizes during transitions. These are things that some instructional podcasts and videos omit [...] She is not a reader : ) The reason she sounds natural is because it is original content that she recorded spontaneously. This is precisely the benefit of working with authentic audio, and part of the reason why I started building out this tool in the first place. I share the experience that anything produced didactically, for educational purposes, typically lacks exactly the kind of nuance that you need to sound natural in a language. It's all about intonation on the phrase and sentence level, the exact nature of false starts and fillers, etc. Achieving comprehension through close listening On 10/20/2021 at 6:58 PM, abcdefg said: Maybe what you have done is add the transcription along with Pinyin That's essentially it, but the process is actually a lot more magical than that. I'll go into how I can get from raw audio that I almost entirely do not understand to the outcome that you can read and hear in the linked player. My point here is that having the audio in exactly this format, with the transcript, allows me to gain access to the text in ways that are otherwise not possible. I can navigate the audio sentence by sentence. The usual "rewind 15 seconds" of general-purpose players doesn't allow me to repeat exactly one sentence as many times as I need, and it also doesn't allow me to pause conveniently at sentence boundaries. The combination of Pinyin and characters allows me to "clear up" my understanding of what's being said. Initally, most of the audio is obscure. After a few passes, I can understand most of what's being said even without looking at the transcript. The integrated dictionary lets me get a pretty good understanding of what's being said even in the presence of many unknown words. Even if you have subtitles e.g. on Youtube, those will be characters, with no word boundaries shown. In tools like Du Chinese, you get both characters and Pinyin, but again no word segmentation: it's all separate syllables. The transcript I have in here shows words, which are often pretty crucial to making sense of the text. So, to sum up, I start by grabbing a piece of authentic, real-life audio like a podcast episode. Initially my level of understanding is "well, that sounds cool, but I get maybe 5% of it." From there I can get to a sentence-aligned transcript with Pinyin and words in the kind of player that I linked to, complete with dictionary annotations. In there, I can listen to each individual sentence until it all begins to clear up and I am able to follow along. ~Gábor 3 Quote
Guest realmayo Posted October 20, 2021 at 07:54 PM Report Posted October 20, 2021 at 07:54 PM On 10/20/2021 at 7:59 PM, glu said: Initally, most of the audio is obscure. After a few passes, I can understand most of what's being said even without looking at the transcript. I'm a huge fan of this kind of close listening, relentlessly repeated/looped in segments ideally until comprehension, with a text to check whenever needed. I tend to use a player like Audacity for the audio, so you can see the waveforms, making it easy to click back with a mouse to the start of a sentence or clause. I've used textbooks texts, listening comprehension tests, and transcripted TV chat shows. Each are more or less fiddly and frustrating in terms of looping the audio and consulting the text, which can be really tiresome and distracting. This tool appears to offer a smoother way to achieve the same thing - eliminating that frustration? And with a wider range of material? In which case it looks great! Is the segmenting of the audio, and the transcribing and pinyinning, done automatically? Quote
杰.克 Posted October 20, 2021 at 09:16 PM Report Posted October 20, 2021 at 09:16 PM On 10/20/2021 at 5:15 PM, Moshen said: Wow. That sure is at odds with how I live my life. Most days I speak no more than 10-20 minutes all told. *shrug Quote
malazann Posted October 21, 2021 at 09:16 PM Report Posted October 21, 2021 at 09:16 PM @glu while there are other ways of doing similar things (like using the Learning With Languages extension on Netflix/Youtube), the UI and how easily I can use this "Ting Shuo Listener" stands out. Might also be useful to make shadowing a little easier. look forward to seeing more Quote
glu Posted October 22, 2021 at 11:20 AM Author Report Posted October 22, 2021 at 11:20 AM On 10/21/2021 at 11:16 PM, malazann said: @glu while there are other ways of doing similar things (like using the Learning With Languages extension on Netflix/Youtube), the UI and how easily I can use this "Ting Shuo Listener" stands out. Might also be useful to make shadowing a little easier. look forward to seeing more Thanks, that's very positive feedback... It is pretty much exactly what drove me in developing this! What is shadowing? ~Gábor 1 Quote
glu Posted October 22, 2021 at 03:17 PM Author Report Posted October 22, 2021 at 03:17 PM Preparing new audio / Part 1 I've just started to delve into a new text, so I thought I'd give you a live update and explain the key steps. I expect the whole process will take a couple of days to a week. (1) The first and very important step is to pick something that both sounds interesting, and also holds the promise that you will be able to roughly understand it : ) This time I picked a recent episode of the Loud Murmurs podcast, where they discuss Shang Chi. It's difficult because it's a rapid-fire, intellectual conversation. But on the pro side, it's very standard and well-articulated Mandarin, and being about American pop culture, there are lots of references that make the content accessible to me. This is the episode: https://loudmurmursfm.com/episodes/shangchi (2) You need to download the audio file. If there's a download button it's easy; but even if there's only an embedded player, there's usually an actual MP3 in the background, which you can identify from the browser's developer tools. (3) Magical step: Get an automatic transcription of the audio. This is one of the key enablers of the entire method. Currently I am using Microsoft's speech-to-text API, which is part of Azure's "Cognitive Services." It's a mess to create an Azure account and figure out how to use the API, but nobody promised this bit would be easy. You also need to fiddle with the audio format, because why would MP3 be supported. The transcription usually needs maybe half as much time as the audio's length, and it costs less than a dollar per hour. (4) The output of the previous step is a JSON file: a semi-human-readable text that contains random, long-ish chunks of the transcribed audio with timestamps for words. What counts as a "word" for Mandarin is quite arbitrary, but it's usually 1 to 4 characters. The tool I wrote takes care of reading this file and extracting the information I need. (5) Magical step: Get the Pinyin for the transcribed audio, after saving the text in a plain text format. For this I use Wenlin, a venerable tool that pre-dates Unicode and looks very strange, but is hands-down one of the best tools out there. After copy-pasting the Hanzi, the relevant functions are in the Edit menu under "Make transformed copy". First you need Pinyin transcription; then, for my purposes, I need to "Remove tone change notation". (6) Split audio into meaningful chunks. The segments returned by the speech recognition service are odd: They are typically much longer than sentences, but often they stop arbitrarily in the middle of a phrase. Also, there is no punctuation included, so you need to play it by ear at this stage - literally. For this phase I use a variation of the final player, in a "segmentation" mode. The image below shows this stage: the Hanzi is the actual transcription result; "words" are whatever the transcription considered as words; the Pinyin for each syllable comes from Wenlin. Although Wenlin's words are much better, I cannot use them here because the timestamps belong to the original words, not to Wenlin's output. In the image above I hover to the right of "chuánqí" around the middle. Clicking after words splits the segment; a different shortcut joins the current segment with the next one. When I'm done, the same bit looks like this: Although strictly speaking at this stage I'm not yet "using" the tool, just preparing new audio, to me this is equally useful as digging into the final annotated result. You can see that the automatic transcription still has a lot of errors, and of course a LOT of the vocab is unknown to me at this stage. It's a thrilling exercise to figure out what are the meaningful units and where the boundaries are, and it's a great way to get attuned to sentence intonation patterns and those so-called "filler words." (Also, if you look at spontaneous speech closely like this, it's striking to realize that most often it's absolutely not clear where one sentence ends and the next one begins. This is not Chinese-specific at all, I had the exact same experience when processing English and German speech. Makes you question a lot of assumptions about both grammar and writing.) --- To be continued --- 1 Quote
alantin Posted October 22, 2021 at 05:39 PM Report Posted October 22, 2021 at 05:39 PM On 10/22/2021 at 2:20 PM, glu said: What is shadowing? You listen to an audio repeatedly attempting to speak along it matching the rhythm, intonation, etc. as closely as you can. https://www.fluentu.com/blog/language-shadowing/ https://howtogetfluent.com/shadowing-for-language-learning/ https://www.chinese-forums.com/forums/topic/60363-shadowing-and-recording-most-effective-method/?tab=comments#comment-472074 1 Quote
glu Posted October 29, 2021 at 09:16 AM Author Report Posted October 29, 2021 at 09:16 AM On 10/22/2021 at 10:02 PM, realmayo said: You might consider asking one of the moderators to split out the two posts about your audio player and put them into a new topic. Can you help me with who the mods are? I messaged @roddy on Oct 23, but my message is as-yet unread. Thanks! 1 Quote
Guest realmayo Posted October 29, 2021 at 09:27 AM Report Posted October 29, 2021 at 09:27 AM On 10/29/2021 at 10:16 AM, glu said: Can you help me with who the mods are? You could also try @imron or @Lu . Quote
roddy Posted October 29, 2021 at 09:49 AM Report Posted October 29, 2021 at 09:49 AM Apologies, I'd read that on mobile and forgot to look again when back at my desk. There's now a new topic of (overwrought, imo) discussion on skills here, would be appreciated if we could keep that discussion there and this discussion here. Apologies if the splitting is messy - it's often hard to tell what best goes where. 1 Quote
Lu Posted October 29, 2021 at 05:36 PM Report Posted October 29, 2021 at 05:36 PM I see it's been resolved, good ? A good way of reaching all of us mods/whoever first has time is to 'Report' a post. Nothing bad happens if you click that button -- you can even report your own post. It simply gives a message to us mods that something needs our attention, and you can write in your report what kind of attention. Can be bad spam, can be a topic that warrants splitting, and everything in between. 1 Quote
glu Posted October 29, 2021 at 06:15 PM Author Report Posted October 29, 2021 at 06:15 PM Thank you, @roddy and @Lu! Good to know how to approach you if needed. There has been progress with the new podcast episode I'm working on, so I hope I'll be able to share an update to my last detailed post tomorrow Quote
glu Posted October 30, 2021 at 05:00 PM Author Report Posted October 30, 2021 at 05:00 PM Preparing new audio / Part 2 Here's the promised update on my journey to understand a super interesting real-life podcast in Mandarin! In the last post I left off at the part where I went through the entire audio and split it into what sound reasonably close to "sentences," or at least coherent chunks that are of a manageable length. I typically end up with larger chunks than what you normally see in subtitles. Unlike in, say, Youtube, in my player it's super easy to go segment by segment and listen as many times as you like, so there's no pressure to keep text very short. Fixing the the segmentation was my first pass through the audio. It took about 3 times as long as the full recording because of frequent repeats, re-listening to what's going on, and joining/splitting by clicking around. (7) It's time to eliminate all the errors from the automatic transcription! This is the part that I absolutely cannot do on my own. You need a native speaker for this task. What I do is go to a freelancer portal and hire someone to edit the transcription. I have good experiences with Upwork, and I normally hire the same person, with whom we've built a very pleasant working relationship. For simpler, shorter texts (say, 20 minutes), I'd typically pay around $15. For this hour-long rapid-fire episode I paid $45. That's the only actual expense I have in the process. Considering that this episode will easily keep my busy for over a month, I find this a very tolerable cost. The tool I'm developing presents this UI for the reviewer: This is very similar to the interface I'm looking at myself, but (a) it's Chinese-text-only, without Pinyin annotations, and (b) the text is editable. (8) When the reviewer is finished, I re-annotate the text as in step (5), i.e., I get the Pinyin version in one go and infuse it back into my software. Now I can listen to the audio in a format that is almost (almost!) perfect: What remains now is one more pass through the audio where I need to fix some word-level errors: (a) Wenlin's Pinyin conversion is also just an automatic tool, and it often gets word segmentation wrong. I included just such an example in the screenshot above: it needs to be "Women qishi zai kan de shihuo", and not "Women qi shizai kan de shihou". In my tool I can fix this by simply clicking around. (b) Occasionally the Pinyin has the wrong reading of a character. This is particularly ironic because we first convert the audio to characters, and have this reviewed by a human; then another automatic step that has no idea about the audio converts the characters into phonetic Pinyin and introduces its own errors. Laugh's on me! The fix these errors I can edit each word in place, as shown below. These problems are easy to spot and fix even for me, because seeing the wrong Pinyin and hearing the audio at the same time makes it very clear what's wrong, and what should be there instead. Once this pass is over, the text is ready for a couple of real close listens. But by that time I will have done two passes over the entire audio and looked at the transcription the whole time, so I'm getting really intimately familiar with the material, even if there's still a lot that I don't fully understand. --- To be continued --- 2 Quote
glu Posted October 31, 2021 at 02:41 PM Author Report Posted October 31, 2021 at 02:41 PM Integrated dictionary If you looked carefully at the screenshots I've been posting you will have noticed the panel on the right that says "Short help" and mentions dictionary entries. While I'm still working my way through the Shang Chi podcast episode, I thought it's time I explained a bit more about this embedded dictionary. Here's the essence of it. When my program prepares this content, it does a number of things. Obviously it matches up the Pinyin to the characters for the annotated text, and it matches up the timestamps to the audio so you can navigate and listen sentence by sentence. But it also reads a downloaded version of the CC-CEDICT dictionary and looks up every word in the text. (CC-CEDICT is the large open-source dictionary that you're all familiar with; it is available online at mdbg.net.) Some details about this: In the interactive player, you can click on any Hanzi to see what dictionary entries there are that match the text. For the screenshot above, I clicked on 失 in the first row of the active segment. Longest matches are shown first; here that's the two-syllable word 失败. But to get a good grasp of multi-syllable words, it's super useful to look at the meanings of individual syllables, so those are listed too. One example where this is very helpful is with two-syllable verbs. They come in flavors like Verb+Object, Verb+Complement, or simply two syllables with no inner structure. AFAIK only the ABC dictionary shows this info explicitly, but looking at the meaning of the verb's parts you can get a fairly clear idea on your own. And you really want to understand this about you verbs, because VO, VC and VV behave differently when you put them in sentences. You'll see there are two entries for everything in the screenshot above. That's just my little idiosyncrasy because I'm also including CHDICT, a Chinese-Hungarian dictionary similar to CEDICT. Unless you speak Hungarian you'd probably not want those alternatives in there. You can pin individual entries so they always show up automatically, without clicking on a character, when you return to the same segment. That's a little creature comfort that's useful for key new vocabulary in a text. Finally, each entry can be added straight to my Anki deck by a single click. To be precise, that click brings up the embedded editor as seen in the next screenshot. There are some interesting things going on with these Anki cards, but this is already getting long here, and they deserve their own post anyway. So let's get back to those next time! Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.