This software is released as part of the supplementary material of the paper:
Scene-Aware 3D Multi-Human Motion Capture from a Single Camera, EUROGRAPHICS 2023
Diogo C. Luvizon | Marc Habermann | Vladislav Golyanik | Adam Kortylewski | Christian Theobalt
Project: https://vcai.mpi-inf.mpg.de/projects/scene-aware-3d-multi-human
Code: https://github.com/dluvizon/scene-aware-3d-multi-human
This software was tested on the following systems:
Operating System: Debian GNU/Linux 10; Ubuntu 20.04.5 LTS
GPU: TITAN V 12Gi; Quadro RTX 8000 48Gi
CPU-only is also supported (very slow)
CPU-RAM: 11 Gi
Python 3 and (Mini)Conda
A minimal installation is possible by simply creating a new conda environment. This assumes that the input data modalities are pre-computed and available.
1.2.1 Create a conda environment
conda env create -f environment.yml
conda activate multi-human-mocap
Note that some packages in environment.yml
are only needed for visualizations.
1.2.2 Download Human Model and Body Joint Regressors
We use the SMPL model that can be downloaded from [ here ]. Download the file SMPL_NEUTRAL.pkl
and place it at model_data/parameters
.
<This step is only required for predictions from new videos>
Our method relies on four different predictors as input data modalities. Please follow the optional instructions [ here ] to install and setup each predictor. As an alternative, one could also install the predictors by their own. Here is a list of each predictor that we use:
- Monocular Depth Estimation: MiDaS/DPT
- 2D Human Pose Estimation and Tracking: AlphaPose
- Initial SMPL Parameters: ROMP
- Instance Segmentation: Mask2Former
All the predictors are independent and can be installed and executed in parallel. Note that all these predictors are not part of our software distribution, although we provide simplified instructions on how to install and adapt, if necessary.
2.1 First, download MuPoTs-3D and rearrange the data:
mkdir -p data && cd data
wget -c http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/MultiPersonTestSet.zip
unzip MultiPersonTestSet.zip
for ts in {1..20}; do
mkdir -p mupots-3d-eval/TS${ts}/images
mv MultiPersonTestSet/TS${ts}/* mupots-3d-eval/TS${ts}/images/
done
rm -r MultiPersonTestSet
2.2 For the sequences from MuPoTs-3D, we provided the pre-processed data required to run our software. Please download it from [ here ] and place the file in data/
. Then, extract it with:
tar -jxf mhmc_mupots-3d-eval.tar.bz2
After this, the folder data/mupots-3d-eval
should have the follow structure:
|-- data/mupots-3d-eval/
|-- TS1/
|-- AlphaPose/
|-- alphapose-results.json
|-- DPT_large_monodepth/
|-- img_%06d.png
|-- images/
|-- img_%06d.jpg
|-- annot.mat
|-- intrinsics.txt
|-- occlusion.mat
|-- Mask2Former_Instances/
|-- img_%06d.png
|-- ROMP_Predictions/
|-- img_%06d.npz
|-- TS2/
[...]
2.3 With the minimal installation done and the MuPoTs-3D data is ready, run:
./script/predict_mupots_full.sh
This runs the optimization in the full dataset (TS1..TS20) and can take a long time, depending on the hardware config. For a quick test of the software, run:
./script/predict_mupots_test.sh # TS1 only, only a few iterations
After running the prediction part for the full dataset, the output will be stored in ./output
.
2.4 Compute the scores from the predicted outputs.
./script/eval_mupots.sh
cat output/mupots/FinalResults.md # show our results
Processing new videos requires all the predictors to be installed. If not done yet, please follow this step. For a new video, please extract it (using ffmpg or a similar tool) to [path_to_video]/images/img_%06d.jpg
.
3.1 Preprocess video frames
Run the script:
# path_to_video="path-to-your-video-file"
./script/preproc_data.sh ${path_to_video}
After this, the folder [path_to_video]/
should contain all the preprocessed outputs (depth maps, 2d pose, SMPL parameters, segmentation).
3.2 Run our code
./script/predict_internet.sh ${path_to_video} ${path_to_video}/output
This script calls mhmocap.predict_internet.py
, which assumes a standard camera with FOV=60. Modify it if you need to properly set the camera intrinsics.
We provide a visualization tool based on Open3D. This module can show our predictions in 3D for a video sequence in an interactive way:
python -m mhmocap.visualization \
--input_path="${path_to_video}/output" \
--output_path="${path_to_video}/output"
Please cite our paper if this software (including any part of it) is useful for you.
@article{SceneAware_EG2023,
title = {{Scene-Aware 3D Multi-Human Motion Capture from a Single Camera}},
author = {Luvizon, Diogo and Habermann, Marc and Golyanik, Vladislav and Kortylewski, Adam and Theobalt, Christian},
journal = {Computer Graphics Forum},
volume = {42},
number = {2},
pages = {371-383},
doi = {https://doi.org/10.1111/cgf.14768},
year = {2023},
}
Please see the License Terms in the LICENSE file.
This work was funded by the ERC Consolidator Grant 4DRepLy (770784).
Some parts of this code were borrowed from many other great repositories, including ROMP, VIBE, and more. We also thank Rishabh Dabral for his special help in the animated characters with Blender.