Jungbin Cho* Junwan Kim* Jisoo Kim Minseo Kim Mingu Kang Sungeun Hong Tae-Hyun Oh Youngjae Yu†
*Equal Contribution. †Corresponding Author.
Official pytorch code release of "DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding"
🚀 01/Jan/25 - Released the inference & evaluation code
git clone https://github.com/whwjdqls/DisCoRD
cd DisCoRD
This codebase was tested on Python 3.8.5 with cuda 11.8
conda env create -f environment.yaml
conda activate discord
DisCoRD can be easily built on any VQ-VAE-based motion generation model. We release checkpoints built on MoMask: Generative Masked Modeling of 3D Human Motions.
-
Download Momask checkpoints. Detailed guidelines can be founded in here.
bash prepare/download_models.sh
-
Download evaluation models and gloves
bash prepare/download_evaluator.sh bash prepare/download_glove.sh
-
Download DisCoRD checkpoint and place it in ./checkpoints.
https://drive.google.com/file/d/1glQFuMvWI_dKeQeS7s8V_4zdOHfIv1wS/view?usp=drive_link
After preparing all checkpoints, the directories should look as follows:
.
└── checkpoints
├── Momask
│ ├── checkpoints
│ │ └── net_best_fid.tar
│ └── configs
├── kit
│ ├── Comp_v6_KLD005
│ ├── rvq_nq6_dc512_nc512_noshare_qdp0.2_k
│ ├── t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns_k
│ ├── text_mot_match
│ └── tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw_k
├── t2m
│ ├── Comp_v6_KLD005
│ ├── length_estimator
│ ├── rvq_nq6_dc512_nc512_noshare_qdp0.2
│ ├── t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns
│ ├── text_mot_match
│ └── tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw
└── DisCoRD_Momask_RFDecoder_best.pth
Run visualize.py to generate motion on arbitrary text input. Ouputs will be saved in ./gifs
❗ Our model generates fixed-length motion. Therefore, an explicit motion length must be provided when generating motion. To generate motion solely from text, you can directly use the motion length predictor provided in MoMask: Generative Masked Modeling of 3D Human Motions.
python visualize.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth --input_text "A person is walking" --m_length 196
❗ Evaluation process takes a lot of time.
Run evaluation.py to evaluate motion generation.
python evaluation.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth
Run eval_MotionPrior.py to evaluate motion reconstruction.
python eval_MotionPrior.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth
Download the HumanML3D or KIT-ML dataset by following the guidelines provided here.
We gratefully acknowledge the open-source projects that served as the foundation for our work:
HumanML3D.
MoMask.
T2M-GPT.
TalkSHOW.
ProbTalk.
TM2D.
This code is released under the MIT License.
If you think this repository is useful for your work, please consider citing it as follows:
@article{cho2024discord,
title={DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding},
author={Cho, Jungbin and Kim, Junwan and Kim, Jisoo and Kim, Minseo and Kang, Mingu and Hong, Sungeun and Oh, Tae-Hyun and Yu, Youngjae},
journal={arXiv preprint arXiv:2411.19527},
year={2024}
}