Awesome
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Introduction
UniAudio is a universal audio generation model, which can solve a lot of audio generation task with only one model, such as TTS, VC, Singing voice synthesis, speech enhancement, speech extraction, text-to-sound, text-to-music and so on. In the following, the details of UniAudio will be introduced. <br>
- Neural Audio Codec Models
- Top-level Design
- Training own UniAudio for any task with your own dataset.
Neural Audio Codec Models
Please refer to codec folder to find the training codec of Neural Audio Codec. We will release the checkpoint of our trained codec after the double-blind review.
Top-level Design
The framework of UniAudio is very simple and useful. It includes 4 steps: (1) define your task. (2) prepare data. (3) tokenize data and save it as .pth file. (4) Training and inference The more clear documents for UniAudio will be released after the double-blind review.
Training the UniAudio
The details of training document will be released.