New Members PandaEye Posted September 24, 2018 at 05:14 PM New Members Report Share Posted September 24, 2018 at 05:14 PM I'm interested in creating a tool that can instantly extract Chinese Subtitles that are physically embedded from Any Chinese video and output to a text file with time stamps--unlocking an endless supply of the highest Quality learning resource (native content, audial, visual, transcriptions). Transcriptions can be imported into Pleco's Screen Reader for immediate translations of the script without having having to manually search word definitions and can simultaneously be followed along while watching the video. Other apps can also be used like Chrome's Zhongwen Popup Dictionary, Hanping's Chinese Popup or any other method for more rapid and rich learning. In order to make this tool a reality it would have to be sold as service in order to compensate those build and maintain the tool; the service would be a site/app where you provide a video/url and pay an inexpensive price per video hour/subscription and the hosted software would immediately deliver you the transcription file. The software that extracts the subtitles would be required to be built on deep/machine learning principles (Artificial Intelligence). I've begun inquiring ML Engineers about their estimated cost to create this tool (Chinese engineers could also be potentially leveraged) and I intend to create a Kick Starter/funding campaign with the goal of creating the software, website/app and maintaining and improving the service if funding is met. Link to inquiry of ML Engineers: https://www.reddit.com/r/deeplearning/comments/9iicq7/request_for_quotation_of_a_use_case_from_the_dl/ What do you think is the size of the market for this service, language learners and other use cases included? There would need to be enough people interested to meet funding otherwise it couldn't happen. What are your thoughts? Quote Link to comment Share on other sites More sharing options...
大块头 Posted September 24, 2018 at 10:30 PM Report Share Posted September 24, 2018 at 10:30 PM Welcome to the forum! I'm always glad to see people innovating in this space, so I'm sorry to pop your bubble... I haven't used it personally, but it looks like this open-source Chrome extension does what you're describing? https://chrome.google.com/webstore/detail/copyfish-?-free-ocr-soft/eenjdnjldapjajjofmldgmkjaienebbj?hl=en 1 2 Quote Link to comment Share on other sites More sharing options...
大块头 Posted September 24, 2018 at 10:49 PM Report Share Posted September 24, 2018 at 10:49 PM It seems to work OK... Quote Link to comment Share on other sites More sharing options...
Flickserve Posted September 25, 2018 at 05:00 AM Report Share Posted September 25, 2018 at 05:00 AM I tried it out. It works better if the background is dark. If the background is light and the font has a shadow on it, it doesn't work at all well. Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted September 25, 2018 at 05:52 AM Report Share Posted September 25, 2018 at 05:52 AM I tried it on two random sentences and with the first one it didn't get it at all, the second one was like in the attached image. Not sure why but not only did it not recognise the captured area, for some reason the last few characters were not part of the area I selected. The full sentence should have been 他肯定不会回来的. Maybe the background is not super dark but it's hardly super light either. It does seem to exist but it's bad going on awful. Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file. In fact to say I'd be interested in that is the understatement of the century. 2 Quote Link to comment Share on other sites More sharing options...
Flickserve Posted September 25, 2018 at 07:04 AM Report Share Posted September 25, 2018 at 07:04 AM 56 minutes ago, XiaoXi said: Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file. This. Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted September 25, 2018 at 08:38 AM Report Share Posted September 25, 2018 at 08:38 AM 1 hour ago, Flickserve said: Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. Probably not that big unfortunately because most people wouldn't know what to do with srt file to get the most out of it. But personally I know that an srt file is the holy grail of language learning. 2 Quote Link to comment Share on other sites More sharing options...
yaokong Posted October 1, 2018 at 01:18 AM Report Share Posted October 1, 2018 at 01:18 AM XiaoXi, could you please explain in 2-3 sentences how you use it? Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted October 1, 2018 at 03:30 AM Report Share Posted October 1, 2018 at 03:30 AM 2 hours ago, yaokong said: XiaoXi, could you please explain in 2-3 sentences how you use it? I'd be interested to know how Flickserve uses it too. Well let me ask you, when you watch a Chinese tv series with hard coded subs and come across a word you don't know - what do you do? Quote Link to comment Share on other sites More sharing options...
Flickserve Posted October 1, 2018 at 03:07 PM Report Share Posted October 1, 2018 at 03:07 PM Me? I am interested in hard subtitles. And then, you can make a srt file A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted October 2, 2018 at 03:16 AM Report Share Posted October 2, 2018 at 03:16 AM 12 hours ago, Flickserve said: I am interested in hard subtitles. And then, you can make a srt file Yes that was my suggestion, we were interested in the same thing so I wondered also how you used an srt file. 12 hours ago, Flickserve said: A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. Oh ok, yes it's useful for that. The srt file is really so much more useful that hardcoded subs. So many possibilities. Is there software for that now? I remember there was software for Japanese a long time ago but no other languages. Quote Link to comment Share on other sites More sharing options...
Flickserve Posted October 2, 2018 at 03:51 AM Report Share Posted October 2, 2018 at 03:51 AM oh definitely. I turned some films into anki cards. We have a thread for that in the forum. I used subs2srs. The whole process was documented by TysonD on the forum. Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted October 2, 2018 at 09:18 AM Report Share Posted October 2, 2018 at 09:18 AM On 9/25/2018 at 1:52 PM, XiaoXi said: Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file. This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音 Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted October 3, 2018 at 03:47 AM Report Share Posted October 3, 2018 at 03:47 AM 23 hours ago, Flickserve said: I used subs2srs. The whole process was documented by TysonD on the forum. Oh right, that's the exact same software I was referring to. The one made originally for Japanese. 18 hours ago, 艾墨本 said: This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音 Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them? Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies. Quote Link to comment Share on other sites More sharing options...
Flickserve Posted October 3, 2018 at 04:23 AM Report Share Posted October 3, 2018 at 04:23 AM there is plenty of video content out there. I just wonder how accurate it can be. Greater than 95%? one incorrect word out of twenty? Quote Link to comment Share on other sites More sharing options...
艾墨本 Posted October 3, 2018 at 01:01 PM Report Share Posted October 3, 2018 at 01:01 PM 9 hours ago, XiaoXi said: Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them? Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies. Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show. As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying! Quote Link to comment Share on other sites More sharing options...
yaokong Posted October 4, 2018 at 11:53 AM Report Share Posted October 4, 2018 at 11:53 AM Quote I challenge you to try finding some SRT files for 人民的名义 here you go, as found on http://subhd.com/ar0/378232 Cannot find the latter even after a rigid search, having illuminated the darkest corners of the interwebs. Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted October 5, 2018 at 03:36 AM Report Share Posted October 5, 2018 at 03:36 AM On 10/3/2018 at 9:01 PM, 艾墨本 said: Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show. How do you look up the words? On 10/3/2018 at 9:01 PM, 艾墨本 said: As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying! The subtitles for a foreign language movie are normally made by someone completely independent from the people who do the dubbing so are unlikely to match. Depends on the movie. Sometimes they match quite well, sometimes they don't match at all. With some movies the subtitles do indeed appear to be based off of the mandarin dubbing, like 猛龙过江 but it's not the norm unfortunately. I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. 1 Quote Link to comment Share on other sites More sharing options...
Flickserve Posted October 5, 2018 at 03:42 AM Report Share Posted October 5, 2018 at 03:42 AM 6 minutes ago, XiaoXi said: I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. Isn't that what viki.com does? I haven't investigated it fully. Quote Link to comment Share on other sites More sharing options...
XiaoXi Posted October 5, 2018 at 06:22 AM Report Share Posted October 5, 2018 at 06:22 AM 2 hours ago, Flickserve said: Isn't that what viki.com does? I haven't investigated it fully. Maybe, I wasn't looking for that myself so don't know, just making a suggestion to 艾墨本. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.