Awesome
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
We provide our implementation and pretrained models as open source in this repository.
Get Started
Please refer to run.md
Capabilities
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.
Currently not every model has repository.
Speech
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | Yes (WIP) |
Speech Separation | TF-GridNet | Yes (WIP) |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural | NeuralWarp | Yes |
Sing
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Audio
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes |
Target Sound Detection | TSDNet | Yes |
Sound Extraction | LASSNet | Yes |
Talking Head
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
Acknowledgement
We appreciate the open source of the following projects:
ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion