Awesome

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Introduction

UniAudio is a universal audio generation model, which can solve a lot of audio generation task with only one model, such as TTS, VC, Singing voice synthesis, speech enhancement, speech extraction, text-to-sound, text-to-music and so on. In the following, the details of UniAudio will be introduced. <br>

Neural Audio Codec Models
Top-level Design
Training own UniAudio for any task with your own dataset.

Neural Audio Codec Models

Please refer to codec folder to find the training codec of Neural Audio Codec. We will release the checkpoint of our trained codec after the double-blind review.

Top-level Design

The framework of UniAudio is very simple and useful. It includes 4 steps: (1) define your task. (2) prepare data. (3) tokenize data and save it as .pth file. (4) Training and inference The more clear documents for UniAudio will be released after the double-blind review.

Training the UniAudio

The details of training document will be released.