PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting,
Thomas Lucas*, Fabien Baradel*, Philippe Weinzaepfel, Grégory Rogez
European Conference on Computer Vision (ECCV), 2022
Pytorch training and evaluation code for PoseGPT on BABEL.
Our code is running using python3.7 and requires the following packages:
- pytorch-1.7.1+cu110
- pytorch3d-0.3.0
- torchvision
- opencv
- PIL
- numpy
- smplx
- einops
- roma
We do not provide support for installation.
You should have AMASS files and BABEL annotations in two seperate repository following this structure:
<babel_dir>
|--- train.json
|--- val.json
|--- test.json
<amass_dir>
|--- smplx
|--- ACCAD # and then it follows the standard AMASS data structure
|--- ...
|--- SSM
|--- smplh
|--- ACCAD
|--- ...
|--- SSM
Then you can preprocess the data by running the following command; and it will create files from the root directories <mocap_dir>
babel_dir='[link_to_babel_dir]'
amass_dir='[link_to_amass_dir]'
mocap_dir='[link_to_preprocessed_data_dir]'
list_split=( 'train' 'test' 'val' )
list_type=( 'smplx' 'smplh' )
for split in "${list_split[@]}"
do
for type in "${list_type[@]}"
do
echo ${type}
echo ${split}
python dataset/preprocessing/babel.py "prepare_annots_trimmed(type='${type}',split='${split}',mocap_dir='${mocap_dir}',babel_dir='${babel_dir}',amass_dir='${amass_dir}')"
done
done
Once the preprocessing is done you should have a data structure such that:
<mocap_dir>
|--- <type> # smplh or smplx
|--- babel_trimmed
|--- <split>_60 # for train, val and test
|--- seqLen64_fps30_overlap0_minSeqLen16
|--- pose.pkl
|--- action.pt
Finally, create simlinks named './babel', './amass', './preprocessed_data' at the root of the git folder (alternatively you can modify the default path arguments in babel.py)
For computing the FID we first need to train a classifier on BABEL:
python3 classify.py --name classifier_ -iter 1000 --classif_method TR -lr 4e-5 --use_bm 0
Different variants exists for the auto-encoder, using the following command you can train the one you want
- Train auto_encoder in debug setting (e.g to debug with a different VQVAE architecture)
python3 auto_encode.py --name auto_encoder_debug --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64 --debug 1 --dummy_data 1
- Train an offline (i.e all timesteps generated simultaneously), transformer based VQ-VAE
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
- Train a transformer based VQ-VAE, with causality in the encoder but not in the decoder (can condition on past observations):
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
- Train a transformer based VQ-VAE autoencoder, with causality in the encoder and in the decoder (can predict future given past on the fly):
python3 auto_encode.py --name auto_encoder --n_codebook 2 --n_e 512 --e_dim 256 --loss l2 --model CausalVQVAE --dropout 0 --freq_vert 2 --learning_rate 5e-5 --alpha_vert 100. --ab1 0.95 --tprop_vert 0.1 --prefetch_factor 4 --alpha_codebook 1. --hid_dim 384 --alpha_codebook 0.25 --train_batch_size 64
Once the auto-encoder is trained, it is available to train the generator.
- Train a generator (using a previously trained autoencoder)
python3 train_gpt.py --name generator --n_codebook 2 --n_e 512 --e_dim 256 --vq_model CausalVQVAE --hid_dim 384 --dropout 0 --vq_ckpt ./logs/auto_encoder_debug/checkpoints/best_val.pt --model poseGPT --n_visu_to_save 2 --class_conditional 1 --gpt_blocksize 512 --gpt_nlayer 8 --gpt_nhead 4 --gpt_embd_pdrop 0.2 --gpt_resid_pdrop 0.2 --gpt_attn_pdrop 0.2 --seq_len 64 --gen_eos 0 --eval_fid 0 --eos_force 1 --seqlen_conditional 1 --embed_every_step 1 --concat_emb 1
- Train a generator in debug mode (using a previously trained autoencoder)
python3 train_gpt.py --name generator --n_codebook 2 --n_e 512 --e_dim 256 --vq_model CausalVQVAE --hid_dim 384 --dropout 0 --vq_ckpt ./logs/auto_encoder_debug/checkpoints/best_val.pt --model poseGPT --n_visu_to_save 2 --class_conditional 1 --gpt_blocksize 512 --gpt_nlayer 8 --gpt_nhead 4 --gpt_embd_pdrop 0.2 --gpt_resid_pdrop 0.2 --gpt_attn_pdrop 0.2 --seq_len 64 --gen_eos 0 --eval_fid 0 --eos_force 1 --seqlen_conditional 1 --embed_every_step 1 --concat_emb 1 --dummy_data 1 --debug 1
You will soon be able to download our pretrained checkpoint here, if you do not want to train the model by yourself.
wget <todo>
If you find our work useful please cite our paper:
@inproceedings{posegpt,
title={PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting},
author={Lucas*, Thomas and Baradel*, Fabien and Weinzaepfel, Philippe and Rogez, Gr\'egory},
booktitle={European Conference on Computer Vision ({ECCV})},
year={2022}
}
PoseGPT is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information.