CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
_{^{BMVC 2024 (Oral Presentation)}}

Authors: Jianyu Zhao, Wei Quan, Bogdan J. Matuszewski

Paper link: https://arxiv.org/abs/2410.09010

If you find this work useful to your research, please consider citing:

@inproceedings{zhao2024cvam,
  title={CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation},
  author={Zhao, Jianyu and Quan, Wei and Matuszewski, Bogdan J},
  booktitle={The 35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
  publisher={BMVA},
  year={2024},
  url={https://bmva-archive.org.uk/bmvc/2024/papers/Paper_967/paper.pdf}
}

This method is developed based on our single-object method CVML-Pose, you may also consider citing:

@article{zhao2023cvml,
  title={CVML-Pose: Convolutional VAE Based Multi-Level Network for Object 3D Pose Estimation},
  author={Zhao, Jianyu and Sanderson, Edward and Matuszewski, Bogdan J},
  journal={IEEE Access},
  volume={11},
  pages={13830--13845},
  year={2023},
  publisher={IEEE}
}

1. Usage

1.1 Download data

Download and navigate to the CVAM-Pose repository, then download the following data from BOP datasets.

cd CVAM-Pose
mkdir original\ data                   # Make the "original data" folder
cd original\ data

export SRC=https://huggingface.co/datasets/bop-benchmark/datasets/resolve/main
wget $SRC/lmo/lmo_base.zip             # Linemod-Occluded base archive
wget $SRC/lmo/lmo_models.zip           # Linemod-Occluded 3D models
wget $SRC/lm/lm_train_pbr.zip          # Linemod PBR images
wget $SRC/lmo/lmo_test_bop19.zip       # BOP Linemod-Occluded test images

Unzip the datasets.

unzip lmo_base.zip                     # Contains folder "lmo"
unzip lmo_models.zip -d lmo            # Unpacks to "lmo"
unzip lm_train_pbr.zip -d lmo          # Unpacks to "lmo"
unzip lmo_test_bop19.zip -d lmo        # Unpacks to "lmo"

We use the default detection results of BOP Challenge 2022. The results are obtained from the Mask-R-CNN detector pretrained by CosyPose. You can download and unzip them through:

wget https://bop.felk.cvut.cz/media/data/bop_datasets_extra/bop22_default_detections_and_segmentations.zip
unzip bop22_default_detections_and_segmentations.zip

1.2 Conda environment

To set up the environment, run:

conda create -n CVAM-Pose python=3.9.7
conda activate CVAM-Pose

pip install matplotlib pandas scikit-learn pyrender pypng opencv-python torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

1.3 Data preprocessing

Images of objects are preprocessed using the crop-and-resize strategy as described in paper supplementary materials. A detailed illustration can also be found in CVML-Pose.

cd CVAM-Pose
conda activate CVAM-Pose

python scripts_data/pbr.py                       # Extract pbr training images
python scripts_data/recon.py --object 1          # Generate ground truth reconstruction images
python scripts_data/detection_bop.py             # Extract test images with BOP detection bounding box

1.4 Model training

cd CVAM-Pose
conda activate CVAM-Pose

python scripts_method/cvae.py                     # Train the CVAE model
python scripts_method/latent.py                   # Process latent representations
python scripts_method/mlp_r.py                    # Train MLP for rotation
python scripts_method/mlp_c.py                    # Train MLP for centre
python scripts_method/mlp_tz.py                   # Train MLP for distance
python scripts_method/cal_t.py                    # Calculate translation

1.5 Evaluation

For evaluation, we use the BOP metrics (VSD, MSSD, and MSPD) as implemented in the BOP Toolkit.

cd CVAM-Pose
git clone https://github.com/thodan/bop_toolkit.git

To use the BOP Toolkit, change the code sys.path.append('/home/jianyu/CVAM-Pose/bop_toolkit/') in the script scripts_method/eva.py to your BOP Toolkit path, then evaluate the estimated poses:

python scripts_method/eva.py

2. License

This repository is released under the Apache 2.0 license as described in the LICENSE.

3. Commercial use

We allow commercial use of this work, as permitted by the LICENSE. However, where possible, please inform us of this use for the facilitation of our impact case studies.

4. Acknowledgements

This work makes use of existing datasets which are openly available at: https://bop.felk.cvut.cz/datasets/

This work makes use of multiple existing code which are openly available at:

5. Additional information

Student Profile

UCLan Computer Vision and Machine Learning (CVML) Group

Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scripts_data		scripts_data
scripts_method		scripts_method
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
_{^{BMVC 2024 (Oral Presentation)}}

1. Usage

1.1 Download data

1.2 Conda environment

1.3 Data preprocessing

1.4 Model training

1.5 Evaluation

2. License

3. Commercial use

4. Acknowledgements

5. Additional information

About

Releases

Packages

Languages

License

JZhao12/CVAM-Pose

Folders and files

Latest commit

History

Repository files navigation

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation BMVC 2024 (Oral Presentation)

1. Usage

1.1 Download data

1.2 Conda environment

1.3 Data preprocessing

1.4 Model training

1.5 Evaluation

2. License

3. Commercial use

4. Acknowledgements

5. Additional information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
_{^{BMVC 2024 (Oral Presentation)}}

Packages