Jump to content
Chinese-Forums
  • Sign Up

Pop-up dictionary that works on hard subs


wulfgar

Recommended Posts

Not sure why you got downvoted, however looking up characters in hard subs requires computer vision including OCR and is therefore much more complicated than the usual pop-up dictionary. A solution for this would likely preprocess the whole video and extract all the text which could then be used to look up characters.

  • Like 1
Link to comment
Share on other sites

looking up characters in hard subs requires computer vision including OCR and is therefore much more complicated than the usual pop-up dictionary. A solution for this would likely preprocess the whole video and extract all the text which could then be used to look up characters

 

That's what I was afraid of. My method right now is to pay someone to type the subtitles into a text doc, then I go back and forth between that doc and the show. It works, but I often lose my place in the text doc. I remember seeing a system where texts(softsubs?) were indexed to the show, and I could mouse-over the words to look them up if necessary. But they had to specifically design each show to to that, and material was limited. I'd like to be able to load my own. 

Link to comment
Share on other sites

Do you mean http://www.fluentu.com/? Maybe also relevant: http://subs2srs.sourceforge.net/.

 

Extracting soft subs from hard subs is an interesting programming problem, I hope I can spend some time on it soon. First step would be to cut the video into parts for each subtitle which could be used as a photo story for self-paced studying. That would be similar to subs2srs, but not only for soft/dvd subs. Then it would be necessary to extract the text from the background and finally use OCR to get text from pixels.

  • Like 1
Link to comment
Share on other sites

I remember seeing a system where texts(softsubs?) were indexed to the show, and I could mouse-over the words to look them up if necessary. But they had to specifically design each show to to that, and material was limited.

 

@wulfgar

Basically, mkv container files have the function to embed subtitles as "hard-subs", "soft-subs" or "separate subs".

 

There are Chinese film where it is re-encoded into text (interlace soft-subs) with the video file, but very limited material.

霸王别姬 Farewell My Concubine is one of those very limited material.

 

I think most Mac users have difficulties extracting the hard-subs from TV-rips or Streaming online videos because most of the Chinese software developer for subtitles extraction program uses Microsoft system, and their tutorial, installation procedure are all written in the Chinese language. 

 

Technically speaking, it is very easy to extract them using IdxSubOcr or Sub2Srt v3.31 provided the users are using MS Windows XP, tech-savvy, and able to comprehend the Chinese tutorial.

  • Like 2
Link to comment
Share on other sites

@eslang Thanks for pointing out IdxSubOcr, can you tell us more about it? What do you need as input, what do you get as output and what is the quality?

 

I have a couple of ripped DVDs which I think are suitable but I would need to invest quite a bit of time to set up windows and go through the tutorials ( I would probably have to get some help to understand them).

Link to comment
Share on other sites

 

 

The accuracy is not perfect, but there's http://blog.a9t9.com...translator.html

Thanks - I missed your post before. That looks quite acceptable; unfortunately I have a mac.

 

I'm not a tech savvy guy, but I wonder if I have the subs extracted into a text file, is there a way to make the text file scroll with the movie? My main problem with my current system is losing my place. 

Link to comment
Share on other sites

@wibr - 关键还是要去摸软件,好多人就难在不敢迈出这一步。

 

There's a tool named IdxSubOCR, which uses MS Office MODI engine to OCR English, Chinese(Traditional &Simplified), Japanese, Korean and many other languages if the MODI Module of these languange has been installed (only your default language of MS Office is installed so you have to install other languages manually if you want to use them in OCR). Though the GUI is in Chinese, it is very simple to use.
http://forum.doom9.org/showthread.php?t=154536
Post dated 22nd May 2010, 18:20

Description: The software is Chinese, Japanese, GBK character recognition result, it can only be used in support of GBK coding environment. General Windows 2000/XP no problem, Windows Me luck, Windows 98 probably not.
http://forum.videohelp.com/threads/362255-IdxSubOcr-OCR-on-chinese-traditional-character-program?s=ff493798847ad8033a17fc51b9433808&p=2341947#post2341947
Post dated 24th Aug 2014 11:36

说明:     本软件的中文、日文识别结果为GBK字符,因此只能在支持GBK编码的环境下使用。

一般Windows 2000/XP没有问题,Windows Me看运气,Windows 98多半不行。

配置MODI以支持简体中文、繁体中文、英文的方法见

《在简体中文Office 2003下OCR繁体中文日文韩文

Although the Chinese software developer "stronghorse" (老马) links are no longer available,
but it is possible to find the software by searching through the Chinese websites.


沒有更好,這便最好
(Blogger comparing different Chinese/Japanese OCR softwares)
http://dvbsub.blogspot.jp/2013/06/blog-post.html
Post dated 4th June 2013

圖片式字幕轉文字格式字幕問題
http://www.hkepc.com/forum/viewthread.php?tid=1765293
Post dated 2012-3-13 16:50

[更新:2013-06-01]繁體中文版 IdxSubOcr(圖片式字幕轉文字式字幕)
http://www.hkepc.com/forum/viewthread.php?tid=1778422
Post dated 2012-4-7 17:56

IdxSubOcr将sub+idx图像字幕OCR转换为srt文本字幕 只需两分钟_pt吧_百度贴吧
http://tieba.baidu.com/p/2358545561?pid=33377211054&cid=0#33377211054
Post dated 2013-05-30 09:13

【教程】如何使用esrXP抽取硬字幕以及IdxSubOcr扫描_yeluoqinxin吧_百度贴吧
http://tieba.baidu.com/p/3494665342?pn=1
Post dated 2014-12-29 14:02

esrXP software has been mentioned before, look for the link on post #7:
Extracting Chinese hardsubs from a video
http://www.chinese-forums.com/index.php?/topic/44954-extracting-chinese-hardsubs-from-a-video/

 

Thanks for pointing out IdxSubOcr, can you tell us more about it? What do you need as input, what do you get as output and what is the quality?

 

I have a couple of ripped DVDs which I think are suitable but I would need to invest quite a bit of time to set up windows and go through the tutorials ( I would probably have to get some help to understand them).

 

It is only worth the while to learn if you have tons of DVDs with subtitles (idx.sub format), TV-rips (like .ts files), streaming video clips (burn-in hard-subs) that you want to extract the subtitles into soft-subs (.srt/.ass format) for Chinese, Japanese and Korean languages. 

 

The best help you can get is over at the Chinese websites.

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...