Home

Awesome

UnivNet

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

This is an unofficial PyTorch implementation of Jang et al. (Kakao), UnivNet.

Audio samples are uploaded!

arXiv githubio GitHub

Notes

Both UnivNet-c16 and c32 results and the pre-trained weights have been uploaded.

For both models, our implementation matches the objective scores (PESQ and RMSE) of the original paper.

Key Features

<img src="docs/model_architecture.png" width="100%">

Prerequisites

The implementation needs following dependencies.

  1. Python 3.6
  2. PyTorch 1.6.0
  3. NumPy 1.17.4 and SciPy 1.5.4
  4. Install other dependencies in requirements.txt.
    pip install -r requirements.txt
    

Datasets

Preparing Data

Note: The mel-spectrograms calculated from audio file will be saved as **.mel at first, and then loaded from disk afterwards.

Preparing Metadata

Following the format from NVIDIA/tacotron2, the metadata should be formatted as:

path_to_wav|transcript|speaker_id
path_to_wav|transcript|speaker_id
...

Train/validation metadata for LibriTTS train-clean-360 split and are already prepared in datasets/metadata. 5% of the train-clean-360 utterances were randomly sampled for validation.

Since this model is a vocoder, the transcripts are NOT used during training.

Train

Preparing Configuration Files

Training

python trainer.py -c CONFIG_YAML_FILE -n NAME_OF_THE_RUN

Tensorboard

tensorboard --logdir logs/

If you are running tensorboard on a remote machine, you can open the tensorboard page by adding --bind_all option.

Inference

python inference.py -p CHECKPOINT_PATH -i INPUT_MEL_PATH -o OUTPUT_WAV_PATH

Pre-trained Model

You can download the pre-trained models from the Google Drive link below. The models were trained on LibriTTS train-clean-360 split.

Results

See audio samples at https://mindslab-ai.github.io/univnet/

We evaluated our model with validation set.

ModelPESQ(↑)RMSE(↓)Model Size
HiFi-GAN v13.540.42314.01M
Official UnivNet-c163.590.3374.00M
Our UnivNet-c163.600.3174.00M
Official UnivNet-c323.700.31614.86M
Our UnivNet-c323.680.30414.87M

The loss graphs of UnivNet are listed below.

The orange and blue graphs indicate c16 and c32, respectively.

<img src="docs/loss.png" width="100%">

Implementation Authors

Implementation authors are:

Contributors are:

Special thanks to

License

This code is licensed under BSD 3-Clause License.

We referred following codes and repositories.

References

Papers

Datasets