Home

Awesome

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXiv GitHub Stars visitors Hugging Face

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

TaskSupported Foundation ModelsStatus
Text-to-SpeechFastSpeech, SyntaSpeech, VITSYes (WIP)
Style TransferGenerSpeechYes
Speech Recognitionwhisper, ConformerYes
Speech EnhancementConvTasNetYes (WIP)
Speech SeparationTF-GridNetYes (WIP)
Speech TranslationMulti-decoderWIP
Mono-to-BinauralNeuralWarpYes

Sing

TaskSupported Foundation ModelsStatus
Text-to-SingDiffSinger, VISingerYes (WIP)

Audio

TaskSupported Foundation ModelsStatus
Text-to-AudioMake-An-AudioYes
Audio InpaintingMake-An-AudioYes
Image-to-AudioMake-An-AudioYes
Sound DetectionAudio-transformerYes
Target Sound DetectionTSDNetYes
Sound ExtractionLASSNetYes

Talking Head

TaskSupported Foundation ModelsStatus
Talking Head SynthesisGeneFaceYes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNetNATSpeechVisual ChatGPTHugging FaceLangChainStable Diffusion