Awesome
Phonix
Generate captions for videos using the power of OpenAI's Whisper API
What?
Phonix is a Python program that uses OpenAI's API to generate captions for videos.
It uses the Whisper model, an automatic speech recognition system that can turn audio into text and potentially translate it too. Compared to other solutions, it has the advantage that its transcription can be "enhanced" by the user providing prompts that indicate the "domain" of the video. This means you may get better results if you use technical terms, acronyms and jargon.
Captivating captions
Now phonix
supports "captivating" captions, which means that you can produce captions that highlight
the currently spoken words in the video and choose the maximum number of words present in each caption.
This means you will be able to produce "influencer-style" captions with few words per caption and highlighting
the current word. 💫<br>
This is enabled through stable-ts so you will need to install it (see below).
Overall the following options are available when it comes to styling the captions:
- Highlight the current word
- Choose the maximum number of words per caption
- Choose the caption font size
- Choose the caption font color
- Choose the caption font family
Why?
Captions are not just for the hearing impaired. They make your content more engaging by boosting your audience's focus, attention and comprehension while allowing them to watch your video without sound.
I was not particularly satisfied with the accuracy of Youtube's and Linkedin's automatic captions so I gave Whisper a try and was impressed by the results. Phonix makes it easy to use Whisper and generate captions for your videos.
How?
Phonix first extracts the audio from the video, then downsamples it in case it's over 25 MB and finally sends it to OpenAI's Whisper API. The API returns the captions in the specified format and Phonix saves them to a file. You can then use the captions in your video editor of choice.
Phonix was originally a command line application but I thought it'd be cool to create a simple GUI for it. Use whichever you feel more comfortable with.
Installation
- Get an OpenAI API key
- This is a paid service and a 25 minute South Park episode cost me around $0.30 to transcribe
- Clone or download this repository
- Install a recent version of Python with Tkinter
- Install
ffmpeg
for your platform - Install Python dependencies:
pip install -r requirements-basic.txt
- If you want to transcribe locally without the need to pay for an OpenAI API key, then
pip install -r requirements-advanced.txt
and choose to run Whisper locally.
- If you want to transcribe locally without the need to pay for an OpenAI API key, then
Command line usage
phonix.py
is the command line interface that also includes the main logic of the program.<br>
It has a few options that you can see by running python phonix.py --help
.
GUI usage
Assuming you have installed the dependencies, you can run the GUI with python phonix_gui.py
.
A demo of the tool can be found in this video.