Awesome

Rhasspy Speech

Port of the speech-to-text system from Rhasspy. This uses Kaldi under the hood to recognize sentences from a set of pre-defined templates.

For example, the template:

sentences:
  - turn (on|off) [the] light

will allow rhasspy-speech to recognize the sentences:

turn on light
turn off light
turn on the light
turn off the light

Supported Languages

Pre-built models and derived from the corresponding voice2json models.

Czech, Czech Republic
German, Germany
English, United States
Spanish, Spain
French, France
Italian, Italy
Dutch, Netherlands
Russian, Russia

Tools and Dependencies

Pre-built tools must be downloaded for rhasspy-speech to work. This includes:

See the build_* scripts in script/ for how these tools are built. See the Dockerfile and script/build_docker.sh for how they are packaged.

You must also have the following system packages installed at runtime:

libopenblas0
libencode-perl

Handling Out of Vocabulary

rhasspy-speech generates two different Kaldi models from the sentence templates: one with a rigid grammar that only accepts the possible sentences, and another with a language model that allows new sentences to be made from the existing words.

Using both the grammar and language model, it's possible to robustly reject sentences outside of the templates. After transcripts are returned from both models, they can be compared to decide whether to accept or reject the grammar transcript.