Home

Awesome

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

arXiv GitHub Stars downloads Hugging Face Hugging Face

This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech).

:tada: :tada: :tada: Updates:

:rocket: News:

Environments

  1. If you want to use env of anaconda:

    conda create -n your_env_name python=3.8
    source activate your_env_name 
    pip install -r requirements_2080.txt   (GPU 2080Ti, CUDA 10.2)
    or pip install -r requirements_3090.txt   (GPU 3090, CUDA 11.4)
    
  2. Or, if you want to use virtual env of python:

    ## Install Python 3.8 first. 
    python -m venv venv
    source venv/bin/activate
    # install requirements.
    pip install -U pip
    pip install Cython numpy==1.19.1
    pip install torch==1.9.0
    pip install -r requirements.txt
    

Documents

Overview

Mel PipelineDatasetPitch InputF0 PredictionAcceleration MethodVocoder
DiffSpeech (Text->F0, Text+F0->Mel, Mel->Wav)LjspeechNoneExplicitShallow DiffusionHiFiGAN
DiffSinger (Lyric+F0->Mel, Mel->Wav)PopCSGround-Truth F0NoneShallow DiffusionNSF-HiFiGAN
DiffSinger (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav)OpenCpopMIDIExplicitShallow DiffusionNSF-HiFiGAN
FFT-Singer (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav)OpenCpopMIDIExplicitInvalidNSF-HiFiGAN
DiffSinger (Lyric+MIDI->Mel, Mel->Wav)OpenCpopMIDIImplicitNonePitch-Extractor + NSF-HiFiGAN
DiffSinger+PNDM (Lyric+MIDI->Mel, Mel->Wav)OpenCpopMIDIImplicitPLMSPitch-Extractor + NSF-HiFiGAN
DiffSpeech+PNDM (Text->Mel, Mel->Wav)LjspeechNoneImplicitPLMSHiFiGAN

Tensorboard

tensorboard --logdir_spec exp_name
<table style="width:100%"> <tr> <td><img src="resources/tfb.png" alt="Tensorboard" height="250"></td> </tr> </table>

Citation

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}}

Acknowledgements

Especially thanks to: