Home

Awesome

Supported functions

Speech recognitionSpeech synthesis
✔️✔️
Speaker identificationSpeaker diarizationSpeaker verification
✔️✔️✔️
Spoken Language identificationAudio taggingVoice activity detection
✔️✔️✔️
Keyword spottingAdd punctuation
✔️✔️

Supported platforms

ArchitectureAndroidiOSWindowsmacOSlinuxHarmonyOS
x64✔️✔️✔️✔️✔️
x86✔️✔️
arm64✔️✔️✔️✔️✔️✔️
arm32✔️✔️✔️
riscv64✔️

Supported programming languages

1. C++2. C3. Python4. JavaScript
✔️✔️✔️✔️
5. Java6. C#7. Kotlin8. Swift
✔️✔️✔️✔️
9. Go10. Dart11. Rust12. Pascal
✔️✔️✔️✔️

For Rust support, please see sherpa-rs

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

on the following platforms and operating systems:

with the following APIs

Links for Huggingface Spaces

<details> <summary>You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.</summary>
DescriptionURL
Speaker diarizationClick me
Speech recognitionClick me
Speech recognition with WhisperClick me
Speech synthesisClick me
Generate subtitlesClick me
Audio taggingClick me
Spoken language identification with WhisperClick me

We also have spaces built using WebAssembly. They are listed below:

DescriptionHuggingface spaceModelScope space
Voice activity detection with silero-vadClick me地址
Real-time speech recognition (Chinese + English) with ZipformerClick me地址
Real-time speech recognition (Chinese + English) with ParaformerClick me地址
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-largeClick me地址
Real-time speech recognition (English)Click me地址
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoiceClick me地址
VAD + speech recognition (English) with Whisper tiny.enClick me地址
VAD + speech recognition (English) with Moonshine tinyClick me地址
VAD + speech recognition (English) with Zipformer trained with GigaSpeechClick me地址
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeechClick me地址
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeechClick me地址
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2Click me地址
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC modelClick me地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-largeClick me地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-smallClick me地址
Speech synthesis (English)Click me地址
Speech synthesis (German)Click me地址
Speaker diarizationClick me地址
</details>

Links for pre-built Android APKs

<details> <summary>You can find pre-built Android APKs for this repository in the following table</summary>
DescriptionURL中国用户
Speaker diarizationAddress点此
Streaming speech recognitionAddress点此
Text-to-speechAddress点此
Voice activity detection (VAD)Address点此
VAD + non-streaming speech recognitionAddress点此
Two-pass speech recognitionAddress点此
Audio taggingAddress点此
Audio tagging (WearOS)Address点此
Speaker identificationAddress点此
Spoken language identificationAddress点此
Keyword spottingAddress点此
</details>

Links for pre-built Flutter APPs

<details>

Real-time speech recognition

DescriptionURL中国用户
Streaming speech recognitionAddress点此

Text-to-speech

DescriptionURL中国用户
Android (arm64-v8a, armeabi-v7a, x86_64)Address点此
Linux (x64)Address点此
macOS (x64)Address点此
macOS (arm64)Address点此
Windows (x64)Address点此

Note: You need to build from source for iOS.

</details>

Links for pre-built Lazarus APPs

<details>

Generating subtitles

DescriptionURL中国用户
Generate subtitles (生成字幕)Address点此
</details>

Links for pre-trained models

<details>
DescriptionURL
Speech recognition (speech to text, ASR)Address
Text-to-speech (TTS)Address
VADAddress
Keyword spottingAddress
Audio taggingAddress
Speaker identification (Speaker ID)Address
Spoken language identification (Language ID)See multi-lingual Whisper ASR models from Speech recognition
PunctuationAddress
Speaker segmentationAddress
</details>

Some pre-trained ASR models (Streaming)

<details>

Please see

for more models. The following table lists only SOME of them.

NameSupported LanguagesDescription
sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20Chinese, EnglishSee also
sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16Chinese, EnglishSee also
sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23ChineseSuitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-en-20M-2023-02-17EnglishSuitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-korean-2024-06-16KoreanSee also
sherpa-onnx-streaming-zipformer-fr-2023-04-14FrenchSee also
</details>

Some pre-trained ASR models (Non-Streaming)

<details>

Please see

for more models. The following table lists only SOME of them.

NameSupported LanguagesDescription
Whisper tiny.enEnglishSee also
Moonshine tinyEnglishSee also
sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17Chinese, Cantonese, English, Korean, Japanese支持多种中文方言. See also
sherpa-onnx-paraformer-zh-2024-03-09Chinese, English也支持多种中文方言. See also
sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01JapaneseSee also
sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24RussianSee also
sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24RussianSee also
sherpa-onnx-zipformer-ru-2024-09-18RussianSee also
sherpa-onnx-zipformer-korean-2024-06-24KoreanSee also
sherpa-onnx-zipformer-thai-2024-06-20ThaiSee also
sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04Chinese支持多种方言. See also
</details>

Useful links

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.

Projects using sherpa-onnx

Open-LLM-VTuber

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

See also https://github.com/t41372/Open-LLM-VTuber/pull/50

voiceapi

<details> <summary>Streaming ASR and TTS based on FastAPI</summary>

It shows how to use the ASR and TTS Python APIs with FastAPI.

</details>

腾讯会议摸鱼工具 TMSpeech

Uses streaming ASR in C# with graphical user interface.

Video demo in Chinese: 【开源】Windows实时字幕软件(网课/开会必备)

lol互动助手

It uses the JavaScript API of sherpa-onnx along with Electron

Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!