Awesome

stream-translator

Command line utility to transcribe or translate audio from livestreams in real time. Uses streamlink to get livestream URLs from various services and OpenAI's whisper for transcription/translation. This script is inspired by audioWhisper which transcribes/translates desktop audio.

Prerequisites

Install and add ffmpeg to your PATH
Install CUDA on your system. If you installed a different version of CUDA than 11.3, change cu113 in requirements.txt accordingly. You can check the installed CUDA version with nvcc --version.

Setup

Setup a virtual environment.
git clone https://github.com/fortypercnt/stream-translator.git
pip install -r requirements.txt
Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Command-line usage

python translator.py URL --flags

By default, the URL can be of the form twitch.tv/forsen and streamlink is used to obtain the .m3u8 link which is passed to ffmpeg. See streamlink plugins for info on all supported sites.

--flags	Default Value	Description
`--model`	small	Select model size. See here for available models.
`--task`	translate	Whether to transcribe the audio (keep original language) or translate to english.
`--language`	auto	Language spoken in the stream. See here for available languages.
`--interval`	5	Interval between calls to the language model in seconds.
`--history_buffer_size`	0	Seconds of previous audio/text to use for conditioning the model. Set to 0 to just use audio from the last interval. Note that this can easily lead to repetition/loops if the chosen language/model settings do not produce good results to begin with.
`--beam_size`	5	Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
`--best_of`	5	Number of candidates when sampling with non-zero temperature.
`--preferred_quality`	audio_only	Preferred stream quality option. "best" and "worst" should always be available. Type "streamlink URL" in the console to see quality options for your URL.
`--disable_vad`		Set this flag to disable additional voice activity detection by Silero VAD.
`--direct_url`		Set this flag to pass the URL directly to ffmpeg. Otherwise, streamlink is used to obtain the stream URL.
`--use_faster_whisper`		Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
`--faster_whisper_model_path`	whisper-large-v2-ct2/	Path to a directory containing a Whisper model in the CTranslate2 format.
`--faster_whisper_device`	cuda	Set the device to run faster-whisper on.
`--faster_whisper_compute_type`	float16	Set the quantization type for faster_whisper. See here for more info.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, follow the instructions here to install faster-whisper and convert your models to CTranslate2 format. Then you can run the CLI with --use_faster_whisper and set --faster_whisper_model_path to the location of your converted model.