Home

Awesome

voiceapi - A simple and clean voice transcription/synthesis API with sherpa-onnx

Thanks to k2-fsa/sherpa-onnx, we can easily build a voice API with Python. <img src="./screenshot.jpg" width="60%">

Supported models

ModelLanguageTypeDescription
zipformer-bilingual-zh-en-2023-02-20Chinese + EnglishOnline ASRStreaming Zipformer, Bilingual
sense-voice-zh-en-ja-ko-yue-2024-07-17Chinese + EnglishOffline ASRSenseVoice, Bilingual
paraformer-trilingual-zh-cantonese-enChinese + Cantonese + EnglishOffline ASRParaformer, Trilingual
paraformer-en-2024-03-09EnglishOffline ASRParaformer, English
vits-zh-hf-theresaChineseTTSVITS, Chinese, 804 speakers
melo-tts-zh_enChinese + EnglishTTSMelo, Chinese + English, 1 speakers

Run the app locally

Python 3.10+ is required

python3 -m venv venv
. venv/bin/activate

pip install -r requirements.txt
python app.py

Visit http://localhost:8000/ to see the demo page

Build cuda image (for Chinese users)

docker build -t voiceapi:cuda_dev -f Dockerfile.cuda.cn .

Streaming API (via WebSocket)

/asr

Send PCM 16bit audio data to the server, and the server will return the transcription result.

The server will return the transcription result in JSON format, with the following fields:

    const ws = new WebSocket('ws://localhost:8000/asr?samplerate=16000');
    ws.onopen = () => {
        console.log('connected');
        ws.send('{"sid": 0}');
    };
    ws.onmessage = (e) => {
        const data = JSON.parse(e.data);
        const { text, finished, idx } = data;
        // do something with text
        // finished is true when the segment is finished
    };
    // send audio data
    // PCM 16bit, with samplerate
    ws.send(int16Array.buffer);

/tts

Send text to the server, and the server will return the synthesized audio data.

The server will return the synthesized audio data in binary format.

    const ws = new WebSocket('ws://localhost:8000/tts?samplerate=16000');
    ws.onopen = () => {
        console.log('connected');
        ws.send('Your text here');
    };
    ws.onmessage = (e) => {
        if (e.data instanceof Blob) {
            // Chunked audio data
            e.data.arrayBuffer().then((arrayBuffer) => {
                const int16Array = new Int16Array(arrayBuffer);
                let float32Array = new Float32Array(int16Array.length);
                for (let i = 0; i < int16Array.length; i++) {
                    float32Array[i] = int16Array[i] / 32768.;
                }
                playNode.port.postMessage({ message: 'audioData', audioData: float32Array });
            });
        } else {
            // The server will return the synthesized result
            const {elapsed, progress, duration, size } = JSON.parse(e.data);
            this.elapsedTime = elapsed;
        }
    };

No Streaming API

/tts

Send text to the server, and the server will return the synthesized audio data.

curl -X POST "http://localhost:8000/tts" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Hello, world!",
           "sid": 0,
           "samplerate": 16000
         }' -o helloworkd.wav

Download models

All models are stored in the models directory Only download the models you need. default models are:

vits-zh-hf-theresa

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-zh-hf-theresa.tar.bz2

vits-melo-tts-zh_en

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-melo-tts-zh_en.tar.bz2

sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2

silero_vad.onnx

curl -SL -O https://github.com/snakers4/silero-vad/raw/master/src/silero_vad/data/silero_vad.onnx

sherpa-onnx-paraformer-trilingual-zh-cantonese-en

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

whisper

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2

sensevoice

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2

sherpa-onnx-streaming-paraformer-bilingual-zh-en

curl -SL -O  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2

sherpa-onnx-paraformer-trilingual-zh-cantonese-en

curl -SL -O  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-trilingual-zh-cantonese-en.tar.bz2

sherpa-onnx-paraformer-en

curl -SL -O  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-en-2024-03-09.tar.bz2