楚留香 Posted February 18, 2006 at 06:07 AM Report Posted February 18, 2006 at 06:07 AM SubRip Operation: Place enencrypted DVD into drive Start SubRip Load a Character Matrix file if you have one Click on VOB toolbar button (dialogue will open) Select "SubPictures to Text via OCR" in radio button group on right side Click on "Open IFO" button in dialogue Navigate to DVD and select VTS_XX_0.IFO Select Language in pulldown box (If there are 2 Chinese, one is trad and one is simp) Press "Start" button Eventually you will see a picture of the text for a given subtitle with one or more characters wrapped in a red box. Enter the characters(s) and press OK. Keep doing this as needed. As the system "learns" you will be asked less often. Every entry is added to a "Character Matrix" file. If only part of a character is wrapped in the red box, press Right arrow to expand the box. When finished for the day, save your you matrix file in the main frame!!: Characters Matrix | Save characters Matrix File You can also save your SRT file in the Subtitles frame: File | Save As The program will recommend saving as Unicode. The Matrix file is very sensitive to the particular font used on the DVD. You can try playing with the options menu to see if you can get a better hit rate. I'll attach a matrix file of some traditional characters I got of some mainland DVDs. You can see the mapping by unzipping, loading it into SubRip, and selecting Character Matrix | Edit/View Character Matrix chinese.zip Quote
atitarev Posted February 18, 2006 at 07:41 AM Report Posted February 18, 2006 at 07:41 AM Thank you, Chǔ liú xiāng. I followed your instructions carefully and used your matrix file. Which version of SubRip are you using? I loaded with no errors the matrix file into SubRip 1.5 beta. There was no language option for Chinese only series of 00-, 01-, etc. When you start the OCR there are language options but Chinese is not among them. It prompts for every basic character to put in manually. 8. Select Language in pulldown box (If there are 2 Chinese, one is trad and one is simp) I can't get this part to work. Do you have a working link to he download file? Am I missing another component or setup. I also tried the previous version of SubRip 1.20. It doesn't load the matrix file. Quote
楚留香 Posted February 18, 2006 at 03:36 PM Report Posted February 18, 2006 at 03:36 PM A couple of points: I'm assuming the DVD is unencrypted (or else you've made it so by ripping it to your hard drive). You've tried other IFO files (VTS_01_0.IFO, VTS_02_0.IFO, etc) If the disk is encrypted (the norm for Western disks, no so much for Chinese disks), you may only see a couple of subtitles. You must unencrypt the disk! You can find out how to do that by doing a Google search. It's possible that the language designations are missing. Do the language names show up when you play the disk in a DVD player? I've found that a matrix file created for one DVD may not work for another DVD. If you're lucky, both disks are use the same font face. Otherwise you'll need to painfully reenter the matrix for the second disk. Fortunately, most DVD publishing companies will stick to the same font among their line of disks. As for the matrix file, do any characters show up in the matrix editor when you select Character Matrix | Edit/View Characters Matrix? The first few dozen entries should be blanks. Also note, I only use SupRip for traditional character subtitles, so it has very few of the simplified forms. I'm using 1.50 Beta 3 from January 2006. Quote
atitarev Posted February 19, 2006 at 02:32 AM Report Posted February 19, 2006 at 02:32 AM Thanks, for your reply, 楚留香, but it didn't work. Yes, it's unencrypted and I tried other IFO's. The one I am trying to do has the right traditional Chinese subtitles, which I was able to put into BMP files. Somehow it doesn't recognise the matrix file as Chinese, although I can see the characters if I try to edit the matrix file (second attachment). Chinese language doesn't exist in the options. I am attaching the screenshots. Quote
楚留香 Posted February 19, 2006 at 04:24 AM Report Posted February 19, 2006 at 04:24 AM Looks like the font in the matrix file isn't a close enough match to your DVD. Or else I don't have the "《" character in the matrix. You'll need to create your own matrix file (or just add on to the existing one). Start up your Chinese IME and enter each red-outlined character into the entry box and press the OK button. The new glyph will get added to the matrix file and assigned to the unicode value of the character(s) you typed in. The red outline will move to the next unidentifed character. As I mentioned before, use right arrow to expand the box. You can use left arrow to shrink it back down if you've expanded it too much. SubRip may not be that useful if you're only going to pull a subtitle from this manufacturer. But if you do more disks, your matrix file will get better and better. I'm not sure why "Chinese" didn't appear in the pulldown. It doesn't really matter since you've already identified the subtitle stream. Quote
atitarev Posted February 19, 2006 at 11:22 PM Report Posted February 19, 2006 at 11:22 PM Thanks, again, 楚留香, for trying to help me. Yes, it seems that the matrix file you attached doesn't help much. The program can't recognise simple and common characters like 的, 大, etc. which are the same in simpl. and trad. although I can see them clearly in the subtitles. My knowledge of Chinese characters is not yet good enough, especially the traditionals. I got stuck after a few sentences where I couldn't find the character's pronunciation (tried looking up by radicals, IME Pad, etc. with no avail). I have to rely on the OCR or maybe search for more matrix files. My purpose was to produce the text files, translate and pinyinise them, learn and then listen to the episodes I have worked on. I will try other movies, I might get more luck with recognosing them. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.