AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Task	Supported Foundation Models	Status
Text-to-Speech	FastSpeech, SyntaSpeech, VITS	Yes (WIP)
Style Transfer	GenerSpeech	Yes
Speech Recognition	whisper, Conformer	Yes
Speech Enhancement	ConvTasNet	Yes (WIP)
Speech Separation	TF-GridNet	Yes (WIP)
Speech Translation	Multi-decoder	WIP
Mono-to-Binaural	NeuralWarp	Yes

Task	Supported Foundation Models	Status
Text-to-Sing	DiffSinger, VISinger	Yes (WIP)

Task	Supported Foundation Models	Status
Text-to-Audio	Make-An-Audio	Yes
Audio Inpainting	Make-An-Audio	Yes
Image-to-Audio	Make-An-Audio	Yes
Sound Detection	Audio-transformer	Yes
Target Sound Detection	TSDNet	Yes
Sound Extraction	LASSNet	Yes

Task	Supported Foundation Models	Status
Talking Head Synthesis	GeneFace	Yes (WIP)

We appreciate the open source of the following projects: