Jan Finster Posted June 25, 2019 at 04:57 PM Report Posted June 25, 2019 at 04:57 PM I wonder if there are any good AI audio transcription services (audio to text) out there? For example, I realised that the old transcripts for the Ximalaya podcasts (https://www.ximalaya.com/) are actually AI transcriptions. So far, I have found (?): https://sonix.ai/languages/transcribe-chinese-mandarin-audio Any experience? Quote
imron Posted June 25, 2019 at 05:28 PM Report Posted June 25, 2019 at 05:28 PM Google and Microsoft all have products that can do this (and Amazon is working on it too), but require programming skills in order to make use of them. Quote
Jan Finster Posted June 25, 2019 at 06:12 PM Author Report Posted June 25, 2019 at 06:12 PM 43 minutes ago, imron said: but require programming skills in order to make use of them. So, I guess I am looking for user-friendly ones for Dummies without programming skills ? Quote
pon00050 Posted June 26, 2019 at 12:13 AM Report Posted June 26, 2019 at 12:13 AM 6 hours ago, Jan Finster said: So, I guess I am looking for user-friendly ones for Dummies without programming skills ? And I think I can recommend something just like that for you! Have no fear. I am no programmer but I was able to pull it off after reading the steps outlined in the blog post. https://auphonic.com/blog/2016/12/02/make-podcasts-searchable-speech-to-text/ 3 Quote
simc Posted June 26, 2019 at 10:57 AM Report Posted June 26, 2019 at 10:57 AM I've tried Sonix and Google. The sonix one was user friendly but the pricing seemed quite dear. The Google API was cheaper but requires programming skills. Both of them still seemed to create a trascript which had most of the words in the audio but enough errors to make the text incomprehesible unless you already know what the text is supposed to say. What comes out seems to be more a starting point which humans have to fix up. It can still be quite useful to have a bad transcript though, makes it easier to look up words you don't know in Pleco clip reader. The program I wrote for the Google API inserts a timecode every 10 seconds so I can find where I am in the transcript based on how far through the audio I am. 1 Quote
Publius Posted June 26, 2019 at 02:55 PM Report Posted June 26, 2019 at 02:55 PM Anyone tried Xunfei API? They're supposed to be the leader in this field. Quote
mikelove Posted June 26, 2019 at 04:37 PM Report Posted June 26, 2019 at 04:37 PM If you've got access to a reasonably up-to-date iPhone or iPad, Apple's SFSpeechRecognizer API does a decent job with Chinese, works offline, is quite easy to use, and is totally free. (does require coding to use, but somebody may have written a free transcriber app using it by now) 3 Quote
imron Posted June 27, 2019 at 12:43 AM Report Posted June 27, 2019 at 12:43 AM 8 hours ago, mikelove said: works offline Are you sure? From the docs you linked to Apple said: Be prepared to handle failures caused by speech recognition limits. Because speech recognition is a network-based service, limits are enforced so that the service can remain freely available to all apps They mention that some languages require an Internet connection (implying that perhaps some languages don't), is there a way to tell which ones do or do not? Quote
mikelove Posted June 27, 2019 at 01:01 AM Report Posted June 27, 2019 at 01:01 AM It depends on the device and the language, but I know from firsthand testing that on a newish iPhone it will do Chinese offline. 1 Quote
imron Posted June 27, 2019 at 01:12 AM Report Posted June 27, 2019 at 01:12 AM What language do you have your UI set to? I wonder if there's any connection with that. Quote
mikelove Posted June 27, 2019 at 02:46 AM Report Posted June 27, 2019 at 02:46 AM US English, but I don't believe there's a connection. Actually they added an API in iOS 13 to detect whether or not a recognizer (initialized with a specific locale) supports on-device recognition. 1 Quote
Jan Finster Posted April 9, 2020 at 09:40 AM Author Report Posted April 9, 2020 at 09:40 AM I recently read about the idea of using a virtual audio cable. If I am not mistaken, this would basically connect your audio (e.g. from Youtube) directly to your listening device (e.g. googletranslator). Has anyone here got the tech skills to set up such a thing? I found this product online, but the full version costs 49$ ?: https://www.vb-audio.com/Cable/ Quote
roddy Posted April 9, 2020 at 10:21 AM Report Posted April 9, 2020 at 10:21 AM On 6/26/2019 at 5:37 PM, mikelove said: (does require coding to use, but somebody may have written a free transcriber app using it by now) A quick look on the Appstore, and there are at least 3 transcription apps which look to have launched in the last year or so. Not tried any, but presumably worth a look. Quote
Jan Finster Posted April 17, 2020 at 02:48 PM Author Report Posted April 17, 2020 at 02:48 PM Today I have tested this automatic transcription service: https://www.happyscribe.co/ Once you register, you get one 30 minute transcription for free. I uploaded a recording from a medical seminar. The audio quality was OK, but not great. The speaker (= non-professional translator) was from 四川. Still, the result was surprisingly OK. Since it is an automatic transcription service, there are obvious limitations, e.g. sometimes they used the wrong character such as 再 instead of 在. So far, I get the impression 90-95% is correct The transcription took about 10 minutes. I wonder if a professional human transcription service is much better when it comes to technical texts at that rate (?) 1 hour costs 12$, which, to me, is fair especially since human transcription services I checked charge 2$/min. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.