Skip to content

Code for CVPR 2024 paper: ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Notifications You must be signed in to change notification settings

m-hamza-mughal/convofusion

Repository files navigation

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Project Page | Arxiv - CVPR 2024

This repostory contains code and data instructions for ConvoFusion project. In case of questions, create a github issue or email [email protected]

🚩 Updates

  • [15.05.2024] initial code release

Initial Steps

Setup and download

1. Conda environment

conda create python=3.9 --name convofusion
conda activate convofusion

Install the packages in requirements.txt and install PyTorch 2.1.2

pip install -r requirements.txt

OR

conda env create --name convofusion --file=environment.yml 

2. Dependencies

Follow the steps in DATASET.md if you want to run inference on pre-trained model.

3. Pre-train model

Download model folders from this link, extract zip file and place both folders in experiments/convofusion/
The zip file contains folders for VAE model and the Gesture Diffusion Model.

For evaluation, download FID network from here and place in experiments/eval

Train your own models

Training guidance

1. Prepare the datasets

Setup BEAT and DnD Group Gesture Dataset. Follow the steps in DATASET.md

2.1. Train VAE model

Please first check the parameters in configs/config_vae_beatdnd.yaml, e.g. NAME,DEBUG.

Then, run the following command:

python -m train --cfg configs/config_vae_beatdnd.yaml --cfg_assets configs/assets.yaml --batch_size 128 --nodebug

2.2. Train latent diffusion model

Please update the parameters in configs/config_cf_beatdnd.yaml, e.g. NAME,DEBUG,PRETRAINED_VAE (change to your latest ckpt model path in previous step)

Then, run the following command:

python -m train --cfg configs/config_cf_beatdnd.yaml --cfg_assets configs/assets.yaml --batch_size 32 --nodebug

3. Get the model outputs on test set

Please first put the tained model checkpoint path to TEST.CHECKPOINT in configs/config_cf_beatdnd.yaml or the config of your experiment folder /path/to/trained-model/folder/config.yaml.

Then, run the following command:

python -m test --cfg /path/to/trained-model/folder/config.yaml --cfg_assets ./configs/assets.yaml

4. Visualization

Utilize and tweak visualize.py script in scripts folder to visualize joint prediction. The results folder will be created after you run test.py

python visualize.py --src_dir /path/to/results/folder/

5. Quantitative Evaluation

We provide scripts for quantitative evaluation in quant_eval folder for both monadic and dyadic tasks. These scripts require the generated results folder containing predicted and GT npy motion files.

Citation

If you find our code or paper helps, please consider citing:

@InProceedings{mughal2024convofusion,
title = {ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis},
author = {Muhammad Hamza Mughal and Rishabh Dabral and Ikhsanul Habibie and Lucia Donatelli and Marc Habermann and Christian Theobalt},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}

Acknowledgments

This repository is based on the awesome MLD repository. Please check out their repository for further acknowledgements of code which they use. We would also like acknowledge the authors of BEAT, Attend-and-Excite, HumanML3D, PhysCap & MoFusion since our code is also based on them.
This work was supported by the ERC Consolidator Grant 4DReply (770784). We also thank Andrea Boscolo Camiletto & Heming Zhu for help with rendering and visualizations, Christopher Hyek for designing the game for the dataset and Wolfram Wagner (MPII IST) for his help in setting the equipment up.

License

This code is distributed under the terms of the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This project is only for research or education purposes, and not freely available for commercial use or redistribution.

Note that our code depends on other libraries, including PyTorch3D, and uses dataset like BEAT which each have their own respective licenses that must also be followed.

About

Code for CVPR 2024 paper: ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages