Home

Awesome

Speech Note

Linux desktop and Sailfish OS app for note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translation

<a href='https://flathub.org/apps/net.mkiol.SpeechNote'><img width='240' alt='Download on Flathub' src='https://dl.flathub.org/assets/badges/flathub-badge-en.png'/></a>

Contents of this README

Description

Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.

Speech Note uses many different processing engines to do its job. Currently these are used:

Languages and Models

Following languages are supported:

Lang IDNameDeepSpeech (STT)Whisper (STT)Vosk (STT)April-ASR (STT)Piper (TTS)RHVoice (TTS)espeak (TTS)MBROLA (TTS)Coqui (TTS)Mimic3 (TTS)WhisperSpeech (TTS)Bergamot (MT)
afAfrikaans
amAmharic● (e)
arArabic
bgBulgarian
bnBengali
bsBosnian
caCatalan
csCzech
cyWelsh
daDanish
deGerman
elGreek● (e)
enEnglish
eoEsperanto
esSpanish
etEstonian● (e)
euBasque● (e)
faPersian
fiFinnish
frFrench
gaIrish
guGujarati
haHausa
heHebrew
hiHindi
hrCroatian
huHungarian● (e)
idIndonesian● (e)
isIcelandic
itItalian
jaJapanese
jvJavanese
kaGeorgian
kkKazakh
koKorean
kyKyrgyz
laLatin
lbLuxembourgish
ltLithuanian
lvLatvian
mkMacedonian
mnMongolian● (e)
mrMarathi
msMalay
mtMaltese
neNepali
nlDutch● (e)
noNorwegian
plPolish
ptPortuguese● (e)
roRomanian● (e)
ruRussian
skSlovak
slSlovenian● (e)
sqAlbanian
srSerbian
svSwedish
swSwahili
teTelugu
thThai● (e)
tlTagalog
tnTswana
trTurkish● (e)
ttTatar
ukUkrainian
uzUzbek
viVietnamese
yoYoruba● (e)
zhChinese

<sup>(e) experimental, most likely doesn't work well</sup> <br/>

Faster Whisper, Coqui TTS and Mimic3 models are only available on x86-64.

Language models can be downloaded directly from the app.

Details of models which are currently configured for download are described in models.json (GitHub) or models.json (GitLab).

How to install

Flatpak packages

Starting from v4.4.0, the app distributed via Flatpak (published on Flathub) consists of the following packages:

Base package includes all the dependencies needed to run every feature of the application. Add-ons add the capability of GPU acceleration, which speeds up some operations in the application.

Base package and add-ons contain many "heavy" libraries like CUDA, ROCm, Torch and Python libraries. Due to this, the size of the packages and the space required after installation are significant. If you don't need all the functionalities, you can use much smaller "Tiny" package (available on Releases page), which provides only the basic features. If you need, you can also use "Tiny" packages together with GPU acceleration add-on.

Comparison between Base, Tiny and Add-ons Flatpak packages:

SizesBaseTinyAMD add-onNVIDIA add-on
Download size0.9 GiB70 MiB+2.1 GiB+3.8 GiB
Unpacked size2.9 GiB170 MiB+11.5 GiB+6.9 GiB
FeaturesBaseTinyAMD add-onNVIDIA add-on
Coqui/DeepSpeech STT++
Vosk STT++
Whisper (whisper.cpp) STT++
Whisper (whisper.cpp) STT AMD GPU--+
Whisper (whisper.cpp) STT NVIDIA GPU--+
Faster Whisper STT+-
Faster Whisper STT NVIDIA GPU--+
April-ASR STT++
eSpeak TTS++
MBROLA TTS++
Piper TTS++
RHVoice TTS++
Coqui TTS+-
Coqui TTS AMD GPU--+
Coqui TTS NVIDIA GPU--+
Mimic3 TTS+-
WhisperSpeech TTS+-
WhisperSpeech TTS AMD GPU--+
WhisperSpeech TTS NVIDIA GPU--+
Punctuation restoration+-
Translator++

Beta version

In addition to the stable version in the Flathub repository, you can try to test the "Beta" version of the upcoming release. This version is usable, but may contain more bugs.

Beta version is available in "flathub-beta" repository. Follow these instructions to enable flathub-beta on your computer.

Building from sources

Arch Linux

It is also possible to build and install the latest development (git) or latest stable (release) version from the repository using the provided PKGBUILD file (please note that the same remarks about building on Linux apply):

git clone <git repository url>

cd dsnote/arch/git      # build latest git version
# or
cd dsnote/arch/release  # build latest release version

makepkg -si

Flatpak

git clone <git repository url>

cd dsnote/flatpak

flatpak-builder --user --install-deps-from=flathub --repo="/path/to/local/flatpak/repo" "/path/to/output/dir" net.mkiol.SpeechNote.yaml

Sailfish OS

git clone <git repository url>

cd dsnote
mkdir build
cd build

sfdk config --session specfile=../sfos/harbour-dsnote.spec
sfdk config --session target=SailfishOS-4.4.0.58-aarch64
sfdk cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_SFOS=ON -DWITH_PY=OFF
sfdk package

Linux (direct build)

Speech Note has many build-time and run-time dependencies. This includes shared and static libraries, 3rd-party executables, Python and Perl scripts. Because of these complexity, the recommended way to build is to use Flatpak tool-chain (Flatpak manifest file and flatpak-builder). If you want to make a direct build (i.e. without flatpak) it is also possible but more complicated.

git clone <git repository url>

cd dsnote
mkdir build
cd build

cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_DESKTOP=ON
make

To make build without support for Python components, add -DWITH_PY=OFF in cmake step.

To see other build options search for option(BUILD_XXX) in CMakeList.txt file.

How to enable a custom model

All models available for download are specified in the configuration file (config/models.json). To enable a custom model that is compatible with currently supported engines, simply edit this file and restart the application.

When you first run the application, the models configuration file is created in:

You can freely edit currently enabled models or add new ones.

Model definition looks like this:

{
    "name": "<model name>",
    "model_id": "<model unique id>",
    "engine": "<engine type>",
    "lang_id": "<lang id>",
    "checksum": "<md5 checksum>",
    "checksum_quick": "<partial md5 checksum>",
    "comp": "<compression type",
    "urls": [
        <model URLs>
    ],
    "size": "<download size of all files>"
}

Allowed engine types: stt_ds, stt_vosk, stt_april, stt_whisper, stt_fasterwhisper, tts_piper, tts_rhvoice, tts_espeak, tts_coqui, tts_mimic3, mnt_bergamot

Allowed compression types: none, gz, xz, tarxz, targz, zip, zipall, dir, dirgz

Allowed URL types: http, https, file

Checksums are calculated for all files after unpacking. If you are adding a new model, you can use the --gen-checksums command line option to find the right checksums. To do this, put empty strings in both checksum and checksum_quick, save the file and run Speech Note with the mentioned option.

For example:

{
    "name": "New Piper Voice",
    "model_id": "en_piper_new",
    "engine": "tts_piper",
    "lang_id": "en",
    "checksum": "",
    "checksum_quick": "",
    "size": ""
    "comp": "dir",
    "urls": [
        "file:///home/me/models/new-model-medium.onnx",
        "file:///home/me/models/new-model-medium.onnx.json"
    ]
}
flatpak run net.mkiol.SpeechNote --verbose --gen-checksums

Contributing to Speech Note

Any contribution is very welcome!

Project is hosted both on GitHub and GitLab. Feel free to make a PR/MR, report an issue or reqest for new feature on the platform you prefer the most.

Translation

Translation files in Qt format are in translations directory.

Preferred way to contribute translation is via Transifex service, but if you would like to make a direct PR/MR, please do it.

How to support

If you find Speech Note useful and would like to support this project, please consider doing one or two of the following:

Libraries

Speech Note relies on following open source projects:

Reviews and demos

License

Speech Note is an open source project. Source code is released under the Mozilla Public License Version 2.0.

3rd party libraries:

The files in the directory nonbreaking_prefixes were copied from mosesdecoder project and distributed under the GNU Lesser General Public License v2.1.