wulfgar Posted December 3, 2015 at 01:44 AM Report Posted December 3, 2015 at 01:44 AM Is there a pop-up dictionary that works on hard subs yet? I'm about to undergo a big reading spurt, and this would be the perfect tool. 1 Quote
wulfgar Posted December 4, 2015 at 04:09 AM Author Report Posted December 4, 2015 at 04:09 AM Wow, I get a negative vote for asking a question? Nice. Quote
wibr Posted December 4, 2015 at 05:04 AM Report Posted December 4, 2015 at 05:04 AM Not sure why you got downvoted, however looking up characters in hard subs requires computer vision including OCR and is therefore much more complicated than the usual pop-up dictionary. A solution for this would likely preprocess the whole video and extract all the text which could then be used to look up characters. 1 Quote
roddy Posted December 4, 2015 at 10:30 AM Report Posted December 4, 2015 at 10:30 AM We have one very occasional poster who makes a habit of downvoting posts for no obvious reason. I shall have a word. 1 Quote
wulfgar Posted December 4, 2015 at 04:47 PM Author Report Posted December 4, 2015 at 04:47 PM looking up characters in hard subs requires computer vision including OCR and is therefore much more complicated than the usual pop-up dictionary. A solution for this would likely preprocess the whole video and extract all the text which could then be used to look up characters That's what I was afraid of. My method right now is to pay someone to type the subtitles into a text doc, then I go back and forth between that doc and the show. It works, but I often lose my place in the text doc. I remember seeing a system where texts(softsubs?) were indexed to the show, and I could mouse-over the words to look them up if necessary. But they had to specifically design each show to to that, and material was limited. I'd like to be able to load my own. Quote
wibr Posted December 4, 2015 at 06:16 PM Report Posted December 4, 2015 at 06:16 PM Do you mean http://www.fluentu.com/? Maybe also relevant: http://subs2srs.sourceforge.net/. Extracting soft subs from hard subs is an interesting programming problem, I hope I can spend some time on it soon. First step would be to cut the video into parts for each subtitle which could be used as a photo story for self-paced studying. That would be similar to subs2srs, but not only for soft/dvd subs. Then it would be necessary to extract the text from the background and finally use OCR to get text from pixels. 1 Quote
iand Posted December 5, 2015 at 05:37 AM Report Posted December 5, 2015 at 05:37 AM The accuracy is not perfect, but there's http://blog.a9t9.com/p/chinese-subtitles-translator.html 2 Quote
wulfgar Posted December 5, 2015 at 05:28 PM Author Report Posted December 5, 2015 at 05:28 PM Yeah, I think that was it. I'll check it out a little more carefully. Good luck on designing that; I look forward to using it some day. Quote
eslang Posted December 7, 2015 at 02:34 AM Report Posted December 7, 2015 at 02:34 AM I remember seeing a system where texts(softsubs?) were indexed to the show, and I could mouse-over the words to look them up if necessary. But they had to specifically design each show to to that, and material was limited. @wulfgar Basically, mkv container files have the function to embed subtitles as "hard-subs", "soft-subs" or "separate subs". There are Chinese film where it is re-encoded into text (interlace soft-subs) with the video file, but very limited material. 霸王别姬 Farewell My Concubine is one of those very limited material. I think most Mac users have difficulties extracting the hard-subs from TV-rips or Streaming online videos because most of the Chinese software developer for subtitles extraction program uses Microsoft system, and their tutorial, installation procedure are all written in the Chinese language. Technically speaking, it is very easy to extract them using IdxSubOcr or Sub2Srt v3.31 provided the users are using MS Windows XP, tech-savvy, and able to comprehend the Chinese tutorial. 2 Quote
wibr Posted December 7, 2015 at 03:40 PM Report Posted December 7, 2015 at 03:40 PM @eslang Thanks for pointing out IdxSubOcr, can you tell us more about it? What do you need as input, what do you get as output and what is the quality? I have a couple of ripped DVDs which I think are suitable but I would need to invest quite a bit of time to set up windows and go through the tutorials ( I would probably have to get some help to understand them). Quote
wulfgar Posted December 7, 2015 at 06:47 PM Author Report Posted December 7, 2015 at 06:47 PM The accuracy is not perfect, but there's http://blog.a9t9.com...translator.html Thanks - I missed your post before. That looks quite acceptable; unfortunately I have a mac. I'm not a tech savvy guy, but I wonder if I have the subs extracted into a text file, is there a way to make the text file scroll with the movie? My main problem with my current system is losing my place. Quote
wibr Posted December 7, 2015 at 07:08 PM Report Posted December 7, 2015 at 07:08 PM @wulfgar If you look at the comments on the page, the author mentions a plugin for chrome: https://chrome.google.com/webstore/detail/copyfish-%F0%9F%90%9F-free-ocr-soft/eenjdnjldapjajjofmldgmkjaienebbj Edit: Just gave it a try on some traditional subtitles in a video, didn't work that well... 1 Quote
eslang Posted December 8, 2015 at 04:13 AM Report Posted December 8, 2015 at 04:13 AM @wibr - 关键还是要去摸软件,好多人就难在不敢迈出这一步。 There's a tool named IdxSubOCR, which uses MS Office MODI engine to OCR English, Chinese(Traditional &Simplified), Japanese, Korean and many other languages if the MODI Module of these languange has been installed (only your default language of MS Office is installed so you have to install other languages manually if you want to use them in OCR). Though the GUI is in Chinese, it is very simple to use.http://forum.doom9.org/showthread.php?t=154536Post dated 22nd May 2010, 18:20Description: The software is Chinese, Japanese, GBK character recognition result, it can only be used in support of GBK coding environment. General Windows 2000/XP no problem, Windows Me luck, Windows 98 probably not.http://forum.videohelp.com/threads/362255-IdxSubOcr-OCR-on-chinese-traditional-character-program?s=ff493798847ad8033a17fc51b9433808&p=2341947#post2341947Post dated 24th Aug 2014 11:36说明: 本软件的中文、日文识别结果为GBK字符,因此只能在支持GBK编码的环境下使用。 一般Windows 2000/XP没有问题,Windows Me看运气,Windows 98多半不行。 配置MODI以支持简体中文、繁体中文、英文的方法见 《在简体中文Office 2003下OCR繁体中文、日文、韩文》。Although the Chinese software developer "stronghorse" (老马) links are no longer available,but it is possible to find the software by searching through the Chinese websites.沒有更好,這便最好(Blogger comparing different Chinese/Japanese OCR softwares)http://dvbsub.blogspot.jp/2013/06/blog-post.htmlPost dated 4th June 2013圖片式字幕轉文字格式字幕問題http://www.hkepc.com/forum/viewthread.php?tid=1765293Post dated 2012-3-13 16:50[更新:2013-06-01]繁體中文版 IdxSubOcr(圖片式字幕轉文字式字幕)http://www.hkepc.com/forum/viewthread.php?tid=1778422Post dated 2012-4-7 17:56IdxSubOcr将sub+idx图像字幕OCR转换为srt文本字幕 只需两分钟_pt吧_百度贴吧http://tieba.baidu.com/p/2358545561?pid=33377211054&cid=0#33377211054Post dated 2013-05-30 09:13【教程】如何使用esrXP抽取硬字幕以及IdxSubOcr扫描_yeluoqinxin吧_百度贴吧http://tieba.baidu.com/p/3494665342?pn=1Post dated 2014-12-29 14:02esrXP software has been mentioned before, look for the link on post #7:Extracting Chinese hardsubs from a videohttp://www.chinese-forums.com/index.php?/topic/44954-extracting-chinese-hardsubs-from-a-video/ Thanks for pointing out IdxSubOcr, can you tell us more about it? What do you need as input, what do you get as output and what is the quality? I have a couple of ripped DVDs which I think are suitable but I would need to invest quite a bit of time to set up windows and go through the tutorials ( I would probably have to get some help to understand them). It is only worth the while to learn if you have tons of DVDs with subtitles (idx.sub format), TV-rips (like .ts files), streaming video clips (burn-in hard-subs) that you want to extract the subtitles into soft-subs (.srt/.ass format) for Chinese, Japanese and Korean languages. The best help you can get is over at the Chinese websites. 2 Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.