Home

Awesome

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

Build and Deploy to PyPI PyPI version

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

Also we have published TTS models that satisfy the following criteria:

Also we have published a model for text repunctuation and recapitalization that:

Installation and Basics

You can basically use our models in 3 flavours:

Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.

PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:

# before
torch.hub.load(repo_or_dir='snakers4/silero-models',
               model='silero_stt',  # or silero_tts or silero_te
               **kwargs)

# after
from silero import silero_stt, silero_tts, silero_te
silero_stt(**kwargs)

Speech-To-Text

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorchONNXQuantizationQualityColab
English (en_v6):heavy_check_mark::heavy_check_mark::heavy_check_mark:linkOpen In Colab
English (en_v5):heavy_check_mark::heavy_check_mark::heavy_check_mark:linkOpen In Colab
German (de_v4):heavy_check_mark::heavy_check_mark::hourglass:linkOpen In Colab
English (en_v3):heavy_check_mark::heavy_check_mark::heavy_check_mark:linkOpen In Colab
German (de_v3):heavy_check_mark::hourglass::hourglass:linkOpen In Colab
German (de_v1):heavy_check_mark::heavy_check_mark::hourglass:linkOpen In Colab
Spanish (es_v1):heavy_check_mark::heavy_check_mark::hourglass:linkOpen In Colab
Ukrainian (ua_v3):heavy_check_mark::heavy_check_mark::heavy_check_mark:N/AOpen In Colab

Model flavours:

jitjitjitjitjit_qjit_qonnxonnxonnxonnx
xsmallsmalllargexlargexsmallsmallxsmallsmalllargexlarge
English en_v6:heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark:
English en_v5:heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark:
English en_v4_0:heavy_check_mark::heavy_check_mark:
English en_v3:heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark:
German de_v4:heavy_check_mark::heavy_check_mark:
German de_v3:heavy_check_mark:
German de_v1:heavy_check_mark::heavy_check_mark:
Spanish es_v1:heavy_check_mark::heavy_check_mark:
Ukrainian ua_v3:heavy_check_mark::heavy_check_mark::heavy_check_mark:

Dependencies

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

Our model will run anywhere that can import the ONNX model or that supports the ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual ONNX inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file in any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.

V4

V4 models support SSML. Also see Colab examples for main SSML tag usage.

IDSpeakersAuto-stressLanguageSRColab
v4_ruaidar, baya, kseniya, xenia, eugene, randomyesru (Russian)8000, 24000, 48000Open In Colab
v4_cyrillicb_ava, marat_tt, kalmyk_erdni...nocyrillic (Avar, Tatar, Kalmyk, ...)8000, 24000, 48000Open In Colab
v4_uamykyta, randomnoua (Ukrainian)8000, 24000, 48000Open In Colab
v4_uzdilnavoznouz (Uzbek)8000, 24000, 48000Open In Colab
v4_indichindi_male, hindi_female, ..., randomnoindic (Hindi, Telugu, ...)8000, 24000, 48000Open In Colab

V3

V3 models support SSML. Also see Colab examples for main SSML tag usage.

IDSpeakersAuto-stressLanguageSRColab
v3_enen_0, en_1, ..., en_117, randomnoen (English)8000, 24000, 48000Open In Colab
v3_en_indictamil_female, ..., assamese_male, randomnoen (English)8000, 24000, 48000Open In Colab
v3_deeva_k, ..., karlsson, randomnode (German)8000, 24000, 48000Open In Colab
v3_eses_0, es_1, es_2, randomnoes (Spanish)8000, 24000, 48000Open In Colab
v3_frfr_0, ..., fr_5, randomnofr (French)8000, 24000, 48000Open In Colab
v3_indichindi_male, hindi_female, ..., randomnoindic (Hindi, Telugu, ...)8000, 24000, 48000Open In Colab

Dependencies

Basic dependencies for Colab examples:

PyTorch

Open In Colab

Open on Torch Hub

# V4
import torch

language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)

Standalone Use

# V4
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v4_ru.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_text = 'В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.'
sample_rate = 48000
speaker='baya'

audio_paths = model.save_wav(text=example_text,
                             speaker=speaker,
                             sample_rate=sample_rate)

SSML

Check out our TTS Wiki page.

Cyrillic languages

Supported tokenset: !,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ

Speaker_IDLanguageGender
b_avaAvarF
b_bashkirBashkirM
b_bulbBulgarianM
b_bulcBulgarianM
b_cheChechenM
b_cvChuvashM
cv_ekaterinaChuvashF
b_myvErzyaM
b_kalmykKalmykM
b_krcKarachay-BalkarM
kz_M1KazakhM
kz_M2KazakhM
kz_F3KazakhF
kz_F1KazakhF
kz_F2KazakhF
b_kjhKhakasF
b_kpvKomi-ZiryanM
b_lezLezghianM
b_mhrMariF
b_mrjMari HighM
b_nogNogaiF
b_ossOsseticM
b_ruRussianM
b_tatTatarM
marat_ttTatarM
b_tyvTuvinianM
b_udmUdmurtM
b_uzbUzbekM
b_sahYakutM
kalmyk_erdniKalmykM
kalmyk_delghirKalmykF

Indic languages

Example

(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:

# V3
import torch
from aksharamukha import transliterate

# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language='indic',
                                     speaker='v4_indic')

orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)

audio = model.apply_tts(roman_text,
                        speaker='hindi_male')

Supported languages

LanguageSpeakersRomanization function
hindihindi_female, hindi_maletransliterate.process('Devanagari', 'ISO', orig_text)
malayalammalayalam_female, malayalam_maletransliterate.process('Malayalam', 'ISO', orig_text)
manipurimanipuri_femaletransliterate.process('Bengali', 'ISO', orig_text)
bengalibengali_female, bengali_maletransliterate.process('Bengali', 'ISO', orig_text)
rajasthanirajasthani_female, rajasthani_femaletransliterate.process('Devanagari', 'ISO', orig_text)
tamiltamil_female, tamil_maletransliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])
telugutelugu_female, telugu_maletransliterate.process('Telugu', 'ISO', orig_text)
gujaratigujarati_female, gujarati_maletransliterate.process('Gujarati', 'ISO', orig_text)
kannadakannada_female, kannada_maletransliterate.process('Kannada', 'ISO', orig_text)

Text-Enhancement

LanguagesQuantizationQualityColab
'en', 'de', 'ru', 'es':heavy_check_mark:linkOpen In Colab

Dependencies

Basic dependencies for Colab examples:

Standalone Use

import torch

model, example_texts, languages, punct, apply_te = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                  model='silero_te')

input_text = input('Enter input text\n')
apply_te(input_text, lan='en')

Denoise

Denoise models attempt to reduce background noise along with various artefacts such as reverb, clipping, high/lowpass filters etc., while trying to preserve and/or enhance speech. They also attempt to enhance audio quality and increase sampling rate of the input up to 48kHz.

Models

All of the provided models are listed in the models.yml file.

ModelJITReal Input SRInput SROutput SRColab
small_slow:heavy_check_mark:8000, 16000, 24000, 44100, 480002400048000Open In Colab
large_fast:heavy_check_mark:8000, 16000, 24000, 44100, 480002400048000Open In Colab
small_fast:heavy_check_mark:8000, 16000, 24000, 44100, 480002400048000Open In Colab

Dependencies

Basic dependencies for Colab examples:

PyTorch

Open In Colab


import torch

name = 'small_slow'
device = torch.device('cpu')
model, samples, utils = torch.hub.load(
  repo_or_dir='snakers4/silero-models',
  model='silero_denoise',
  name=name,
  device=device)
(read_audio, save_audio, denoise) = utils

i = 0
torch.hub.download_url_to_file(
  samples[i],
  dst=f'sample{i}.wav',
  progress=True
)
audio_path = f'sample{i}.wav'
audio = read_audio(audio_path).to(device)
output = model(audio)
save_audio(f'result{i}.wav', output.squeeze(1).cpu())

i = 1
torch.hub.download_url_to_file(
  samples[i],
  dst=f'sample{i}.wav',
  progress=True
)
output, sr = denoise(model, f'sample{i}.wav', f'result{i}.wav', device='cpu')

Standalone Use

import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/denoise_models/sns_latest.jit',
                                   local_file)  

model = torch.jit.load(local_file)
torch._C._jit_set_profiling_mode(False) 
torch.set_grad_enabled(False)
model.to(device)

a = torch.rand((1, 48000))
a = a.to(device)
out = model(a)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to these wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, and read the latest news.

Commercial Inquiries

Please refer to our wiki and the Licensing and Tiers page for relevant information, and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

Chinese

Russian

Donations

Please use the "sponsor" button.