markhavemann Posted October 31, 2023 at 12:28 PM Report Posted October 31, 2023 at 12:28 PM I was having a look at the vocabulary for different shows and noticed that things like 狂飙 and 隐秘的角落 were relatively lexically easy (the vocabulary wasn't very difficult, even compared to something like 家有儿女). Despite this, I found that I had to work reasonably hard to follow them, and I started to wonder what other metrics could be included for analyzing difficulty, especially of something that includes an audio aspect. So I followed the rabbit hole and analyzed the speaking speed of a bunch of shows and other media to compare. I thought the results were pretty interesting and wanted to share. I found that 狂飙看 (The Knockout) and 谁是被害者 (Victim's Game) were way up at the top, which matched with my experience of finding them hard work to follow compared to other shows. I also calculated language density, which is how much speaking there is in relation to the full length of the audio (why a sitcom is more useful to watch than an action movie is because there is way more speaking). I'd like to include this here too, as well as a breakdown and comparison on vocabulary difficulty and HSK density. As I go I'll update this post. Hopefully it's interesting to others. 1 3 Quote
becky82 Posted November 1, 2023 at 02:35 AM Report Posted November 1, 2023 at 02:35 AM Indeed, speaking speed is important too! I've been using "characters per minute" as my unit (which is "characters per second" multiplied by 60). I noticed CCTV speaking at 250 characters per minute. At that speed, you simply don't have time to actually think about individual words; you need to reflexively recognize and understand groups of words. If I recall correctly, the HSK6 speaking speed is something like 200 characters per minute. It'd be interesting to make the plot 2-dimensional: on one axis, have the vocabulary size (or vocabulary size per unit time) time on one axis, and speaking speed on the other. 1 Quote
markhavemann Posted November 1, 2023 at 06:57 AM Author Report Posted November 1, 2023 at 06:57 AM On 11/1/2023 at 10:35 AM, becky82 said: It'd be interesting to make the plot 2-dimensional: on one axis, have the vocabulary size (or vocabulary size per unit time) time on one axis, and speaking speed on the other. I was thinking of putting language density on the other axis, but that sounds much more interesting. I'll have to think a bit about the best way to go about it. On 11/1/2023 at 10:35 AM, becky82 said: I noticed CCTV speaking at 250 characters per minute That puts it right up there are the top of the graph, especially since I didn't count the time in between utterances, only characters per second that they were actually speaking, which would probably move everything down a bit even more compared to the news. I would really like to add news in here but to compare it in the same way I would either need to change the way I'm analyzing things or news subtitles with timings. I'll see if I can find some. On 11/1/2023 at 10:35 AM, becky82 said: At that speed, you simply don't have time to actually think about individual words; you need to reflexively recognize and understand groups of words. This! You just said it perfectly. Almost no time at all to take a breath and recall the meaning of something or let things sink in. Quote
wibr Posted November 1, 2023 at 09:56 AM Report Posted November 1, 2023 at 09:56 AM I considered including the speaking speed on my Graded Watching website and calculate this and some more statistics internally, but in the end I decided against using it since the variation is not that large, at least compared to the required vocabulary. So my main indicator for difficulty is unique words per hour, a low language density or slow speaking speed would implicitly lead to a lower number of unique words per hour. Also, I don't want to add too many columns, other factors are also important for chosing something to watch, e.g. rating, genre, accent and so on. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.