Home

Awesome

Faber-js

An application that use mozilla deepspeech to perform Speech-To-Text searches on a given corpus.
It uses mozilla DeepSpeech and the Node.JS / Electron.JS package with the Italian Model for speech-to-text extraction, and Natural for text classification.

About

This project was carried out during the Mozilla Italia Developer Contest, the purpose of the app is to try to guess which Fabrizio De André song you are singing (or, better, reading).
The app streams the microphone audio from the browser to a NodeJS server (using socket.io) where DeepSpeech will read the buffer and a classifier will classify the DeepSpeech result.
You can find out more about the corpus here while here you can see how to train and load the classifier. At the moment, for text extraction, only 2 (really simple) algorithms are supported : Levenshtein Distance (slower but more accurate) and Bayes Classification (faster but less accurate).

Live demo (may be offline) : https://deepspeech.czzncl.dev/

App settings

*Only with Levenshtein

Installation

with docker

The easiest way to install the app is using docker :

git clone https://github.com/nicosh/faber-js.git
cd faber-js
docker build --tag "faberjs" .
docker run -p 3000:3000 faberjs

manual installation

Please note that you need python and sox, sox need to be added to PATH (on windows) see https://github.com/JoFrhwld/FAVE/wiki/Sox-on-Windows and https://github.com/nodejs/node-gyp#installation

clone the repo :
git clone https://github.com/nicosh/faber-js.git
Also you need to manually download the models from https://github.com/MozillaItalia/DeepSpeech-Italian-Model/releases/download/2020.08.07/transfer_model_tensorflow_it.tar.xz and move them inside app/DeepSpeech/models

Then:
cd faber-js/app
npm install

Build for production:
npm run build
npm run start

or run in dev mode
npm run dev

Tests and know issues

At the moment seems that voice recognition using .wav files have better performance compared to voice recognition using live streaming.
Some simple test using static files can be found here. To compare the results is enough to run the app and try to pronounce the same sentences as the files above or just play the files and record the speakers with the microphone. Some things to investigate :