Home

Awesome

Burnt-in subtitle extractor

This project provides a set of basic extraction tools for burnt-in subtitles, i.e. subtitles that are part of the picture itself.

Tools

The burnt-in subtitle extractor consists of four different tools:

Examples

The following examples will in part rely on default parameters. You will typically have to provide parameters tuned to your input file. Type e.g. getsubtitles --help for details.

> ffmpeg -i test.flv -f yuv4mpegpipe - | ./locsubtitles > test.sub
> ffmpeg -i test.flv -f yuv4mpegpipe - | ./remsubtitles -s test.sub | ffplay -

How it works

This program relies on several patterns in how subtitles are typically added to videos:

The algorithm then does the following:

The OCR component uses the Tesseract engine, which has to be installed on the system. The target language is currently hard-coded to Dutch. This can be changed in the wrapper script.

Prerequisites

The program has been tested with the following setup, only:

Other setups might work.