Jump to content
Chinese-Forums
  • Sign Up

How do you get subtitles off of dvd's?


Recommended Posts

Posted

I'm sure this information is somewhere on this site but for whatever reason I did a search and couldn't find anything. Maybe I used the wrong search terms or maybe it has something to do with my ability to concentrate with a 2 year old around the house.

At any rate, at one time I heard here that there was a way to extract the subtitles from dvd's but I thought it sounded difficult and thought I could manage without it. But at this point, I know I need the extra help for developing my listening skills and however much work it might be, I think it would be worth it. I think someone here may even have told me how to do it at one point.

So how do you get the subtitles off of dvd's? And can they put in a format where I could copy and paste new characters into an online dictionary?

Thanks

Posted
And can they put in a format where I could copy and paste new characters into an online dictionary?

AFAIK, no.

The subtitles on DVDs are basically pictures. It makes it much easier to support all languages in all players.

Posted

I think that's right. I don't think there are programs smart enough to "read" DVD subtitle images in Chinese and produce plain text.

You might want to have a look on shooter.cn to see if you can find corresponding subtitle files.

Posted

Well even if I can't copy and paste it would still be useful - better than pausing the dvd player every few seconds.

Posted

As mentioned, the subtitle format on DVDs are pictures.

HOWEVER, there are subtitle formats used for other purposes, e.g. rips of DVDs to avi files, that are text. ".srt" is the most common. You can do a web search and see if there are available.

If not, there are plenty of programs that will at least put them into a graphics files for you. You might want to take a look at SupRip, which I think can.

From this post, you might want to look at esrXP and subOCR.

Posted

For English subtitles there are programs that will extract the subtitles from a dvd and attempt to turn them into text through OCR. No idea if there is anything similar for Chinese. I doubt it since the resolution of DVD subtitles is pretty low, making it hard for a program to recognise the characters.

It would be much easier to just download the subtitles as others mentioned.

Posted

I think SubRip will work, but the OCR process is quite time-consuming and requires a lot of manual corrections. Ironically, you will learn the characters through the process of creating the text file, and won't really need the file once it's done. As others have mentioned, shooter.cn or another subtitle site is preferable, if you can find the subtitle file there.

Posted

With SupRip I was able to get a bunch of .bmp's. I guess that will have to do, its OCR didn't seem to recognize Chinese. Although it was listed as a language option somewhere, maybe they were thinking only pinyin.

I don't think I will be able to find the subtitles on the internet anywhere - I'm working on Elmo's World dvd's.

Posted

You might want to look at esrXP and subOCR -- there are reports that it does a better job with OCR.

And I wouldn't give up the fight without at least a google search!

  • 4 weeks later...
  • New Members
Posted

I looked around for OCR software to process that sequence of BMPs generated by SubRip. I successfully tested Abbyy Finereader 10. It is expensive, but the 15-day trial version does the job (for 15 days anyway). Here is what I did:

1. Generate BMPs with SubRip using the "Custom Colors and Contrast" option, and choose black characters and white background (I set those "border" areas to white as well). I set a minimum width of 360 pixels (and minimum height of 50 pixels) for the output bitmaps.

(1.1) In my case SubRip had a problem generating a correct sequence of BMPs from the original VOB file on the DVD (ca. 5% of the subtitle BMPs were either blank or a grainy mess). I solve this problem by generating an IDX file out of the VOB file, using VSRip. I then used that IDX file with SubRip.

2. Insert the bulk of BMPs into a Word Document (e.g. simply select the whole directory on the insert dialogue in Word). The result should see all bitmaps arrayed vertically, i.e. one bitmap per "line".

3. Save that document as a PDF file.

4. Open that PDF in Abbyy Finereader. Then comes the time-consuming part of resolving those ambiguities that Abbyy Finereader has marked (i.e. manually select from a list of suggestions the right match for a character that the programm wasn't able to recognize with certainty). It took me around 2 hours of work for one hour of movie to do that. I haven't found a way to "teach" the software to learn from my corrections. For example, there may be a very common character in the text which Abbyy Finereader cannot read repeatedly. I then had to resolve each instance separately, rather than tell the software once and have it apply it to all instances. If anyone knows how to "teach" Abbyy Finereader, please let me know.

5. The 15-day trial version of Abbyy Finereader doesn't allow you to save the resulting document, so instead I simply copy and pasted each page separately into a new Word document. It is a good idea to insert a page break into the Word document for each new page copied from the Finereader output.

6. You will notice that Finereader does not observe the line breaks of the original document, instead putting a simple space between each line of subtitles. That is not a problem: Once you have copied and pasted the complete clean OCR output into Word, you can simple replace " " with "^p", and you are good.

7. If you inserted a page break in Word after each new copy+pasted page from Finereader, then it is easy to error-check the line breaking in Word: each page should have exactly the same number of rows (provided that each of the original BMPs had the same height).

8. The result is a text that contains all subtitles as plain text -- one row per original subtitle BMP. You can go on to convert it into an SRT file (e.g. take the time stamps from the corresponding English subtitle files, which SubRip can OCR without additional software).

In this way, I obtained Chinese subtitles of the first episode of 痞子英雄 (Black & White). It is easy to create Pinyin subtitles from that file, too. Overall I am very surprised and satisfies by the OCR performance of Abbyy Finereader. This procedure is nevertheless time consuming. It could be sped up significantly if I found a way to "teach" Finereader. If anyone can tell me how to do that, I would highly appreciate the help.

  • Like 1

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...