Skip to content

Latest commit

 

History

History
149 lines (120 loc) · 5.81 KB

README.md

File metadata and controls

149 lines (120 loc) · 5.81 KB

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding [arXiv 2024]

Jungbin Cho*Junwan Kim*Jisoo KimMinseo KimMingu KangSungeun HongTae-Hyun OhYoungjae Yu

*Equal Contribution. Corresponding Author.


Official pytorch code release of "DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding"

📨 News

🚀 01/Jan/25 - Released the inference & evaluation code

⚙️ Settings

git clone https://github.com/whwjdqls/DisCoRD
cd DisCoRD

Environments

This codebase was tested on Python 3.8.5 with cuda 11.8

conda env create -f environment.yaml
conda activate discord

Download Checkpoints

DisCoRD can be easily built on any VQ-VAE-based motion generation model. We release checkpoints built on MoMask: Generative Masked Modeling of 3D Human Motions.

  1. Download Momask checkpoints. Detailed guidelines can be founded in here.

    bash prepare/download_models.sh
  2. Download evaluation models and gloves

    bash prepare/download_evaluator.sh
    bash prepare/download_glove.sh
  3. Download DisCoRD checkpoint and place it in ./checkpoints.

    https://drive.google.com/file/d/1glQFuMvWI_dKeQeS7s8V_4zdOHfIv1wS/view?usp=drive_link

After preparing all checkpoints, the directories should look as follows:

.
└── checkpoints
    ├── Momask
    │   ├── checkpoints
    │   │   └── net_best_fid.tar
    │   └── configs
    ├── kit
    │   ├── Comp_v6_KLD005
    │   ├── rvq_nq6_dc512_nc512_noshare_qdp0.2_k
    │   ├── t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns_k
    │   ├── text_mot_match
    │   └── tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw_k
    ├── t2m
    │   ├── Comp_v6_KLD005
    │   ├── length_estimator
    │   ├── rvq_nq6_dc512_nc512_noshare_qdp0.2
    │   ├── t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns
    │   ├── text_mot_match
    │   └── tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw
    └── DisCoRD_Momask_RFDecoder_best.pth

💭 Inference

Run visualize.py to generate motion on arbitrary text input. Ouputs will be saved in ./gifs

❗ Our model generates fixed-length motion. Therefore, an explicit motion length must be provided when generating motion. To generate motion solely from text, you can directly use the motion length predictor provided in MoMask: Generative Masked Modeling of 3D Human Motions.

python visualize.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth --input_text "A person is walking" --m_length 196

🏃🏻‍♂️ Evaluation

❗ Evaluation process takes a lot of time.

Run evaluation.py to evaluate motion generation.

python evaluation.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth

Run eval_MotionPrior.py to evaluate motion reconstruction.

python eval_MotionPrior.py --model_ckpt_path ./checkpoints/DisCoRD_Momask_RFDecoder_best.pth

🔥 Training

Download Datasets

Download the HumanML3D or KIT-ML dataset by following the guidelines provided here.

Training Code Coming Soon...

👀 Acknowledgements

We gratefully acknowledge the open-source projects that served as the foundation for our work:

HumanML3D.
MoMask.
T2M-GPT.
TalkSHOW.
ProbTalk.
TM2D.

🔑 License

This code is released under the MIT License.

Citations

If you think this repository is useful for your work, please consider citing it as follows:

@article{cho2024discord,
  title={DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding},
  author={Cho, Jungbin and Kim, Junwan and Kim, Jisoo and Kim, Minseo and Kang, Mingu and Hong, Sungeun and Oh, Tae-Hyun and Yu, Youngjae},
  journal={arXiv preprint arXiv:2411.19527},
  year={2024}
}