Language-guided Human Motion Synthesis with Atomic Actions
Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, and Junsong Yuan
University at Buffalo
ACM MM 2023
This repo contains the our PyTorch implementation on the text-to-motion synthesis task.
Install ffmpeg
sudo apt update
sudo apt install ffmpeg
Setup conda environment
pip install -r requirements.txt
python -m spacy download en_core_web_sm
Download dependency for text-to-motion synthesis
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
Please follow MDM to setup the dataset.
For training on the HumanML3D dataset, run the following command.
python -m train.train_cvae --save_dir save/humanml --overwrite --dataset humanml --eval_during_training --kld_w 1e-2 --att_spa_w 1e-2 --codebook_norm_w 1e-2 --mask_ratio 0.5 --mask_sched linear
For the KIT dataset, add --dataset kit
flag. Please also change the save_dir accordingly.
Besides, we also provide option to use better CLIP encoder by adding --use_transformers_clip
flag, which potentially gives better performance.
We provide our pretrain checkpoint here. To reproduce our result, run the following command. For HumanML3D:
python -m eval.eval_humanml_cvae --model_path {path-to-humanml-pretrained-model} --dataset humanml --eval_mode mm_short
For KIT:
python -m eval.eval_humanml_cvae --model_path {path-to-kit-pretrained-model} --dataset kit --num_code 512 --eval_mode mm_short
Run the following command to generate a motion given a text prompt. The output contains stick figure animation for the generated motion, and a .npy file containing the xyz coordinates of the joints.
python -m sample.generate_cvae --modal_path {checkpoint-path} --text_prompt {text prompt}
Use the following command to create the .obj SMPL mesh file.
python -m visualize.render_mesh --input_path {path-to-mp4-stick-animation-file}
We also provide a Blender script in render/render.blend
to render the generated SMPL mesh.
If you find our project helpful, please cite our work
@inproceedings{zhai2023language,
title={Language-guided Human Motion Synthesis with Atomic Actions},
author={Zhai, Yuanhao and Huang, Mingzhen and Luan, Tianyu and Dong, Lu and Nwogu, Ifeoma and Lyu, Siwei and Doermann, David and Yuan, Junsong},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={5262--5271},
year={2023}
}
This project is developed upon MDM: Human Motion Diffusion Models, thanks to their great work!