Home

Awesome

Rhasspy Speech

Port of the speech-to-text system from Rhasspy. This uses Kaldi under the hood to recognize sentences from a set of pre-defined templates.

For example, the template:

sentences:
  - turn (on|off) [the] light

will allow rhasspy-speech to recognize the sentences:

Supported Languages

Pre-built models and derived from the corresponding voice2json models.

Tools and Dependencies

Pre-built tools must be downloaded for rhasspy-speech to work. This includes:

See the build_* scripts in script/ for how these tools are built. See the Dockerfile and script/build_docker.sh for how they are packaged.

You must also have the following system packages installed at runtime:

Handling Out of Vocabulary

rhasspy-speech generates two different Kaldi models from the sentence templates: one with a rigid grammar that only accepts the possible sentences, and another with a language model that allows new sentences to be made from the existing words.

Using both the grammar and language model, it's possible to robustly reject sentences outside of the templates. After transcripts are returned from both models, they can be compared to decide whether to accept or reject the grammar transcript.