Gen6D is able to estimate 6DoF poses for unseen objects like the following video.
- Pretrained models and evaluation codes.
- Pose estimation on custom objects.
- Training codes.
Required packages are listed in requirements.txt
. To determine how to install PyTorch along with CUDA, please refer to the pytorch-documentation
- Download pretrained models, GenMOP dataset and processed LINEMOD dataset at here.
- Organize files like
Gen6D
|-- data
|-- model
|-- detector_pretrain
|-- model_best.pth
|-- selector_pretrain
|-- model_best.pth
|-- refiner_pretrain
|-- model_best.pth
|-- GenMOP
|-- chair
...
|-- LINEMOD
|-- cat
...
# Evaluate on the object TFormer from the GenMOP dataset
python eval.py --cfg configs/gen6d_pretrain.yaml --object_name genmop/tformer
# Evaluate on the object cat from the LINEMOD dataset
python eval.py --cfg configs/gen6d_pretrain.yaml --object_name linemod/cat
Metrics about ADD-0.1d and Prj-5 will be printed on the screen.
3D bounding boxes of estimated poses will be saved in data/vis_final/gen6d_pretrain/genmop/tformer
.
Ground-truth is drawn in green while prediction is drawn in blue.
Intermediate results about detection, viewpoint selection and pose refinement will be saved in data/vis_inter/gen6d_pretrain/genmop/tformer
.
This image shows detection results.
This image shows viewpoint selection results. The first row shows the input image to the selector. The second row shows the input images rotated by the estimated in-plane rotation (left column) or the ground-truth in-plane rotation(right column) Subsequent 5 rows show the predicted (left) or ground-truth (right) 5 reference images with nearest viewpoints to the input image.
This image shows the pose refinement process. The red bbox represents the input pose, the green one represents the ground-truth and the blue one represents the output pose for the current refinement step.
Please refer to custom_object.md
Download processed co3d data (co3d.tar.gz), google scanned objects data (google_scanned_objects.tar.gz) and ShapeNet renderings (shapenet.tar.gz) at here.The training set is not provided anymore. You may need to read the codes and process them yourself.Download COCO 2017 training set.- Organize files like
Gen6D
|-- data
|-- GenMOP
|-- chair
...
|-- LINEMOD
|-- cat
...
|-- shapenet
|-- shapenet_cache
|-- shapenet_render
|-- shapenet_render_v1.pkl
|-- co3d_256_512
|-- apple
...
|-- google_scanned_objects
|-- 06K3jXvzqIM
...
|-- coco
|-- train2017
- Train the detector
python train_model.py --cfg configs/detector/detector_train.yaml
- Train the selector
python train_model.py --cfg configs/selector/selector_train.yaml
- Prepare the validation data for training refiner
python prepare.py --action gen_val_set \
--estimator_cfg configs/gen6d_train.yaml \
--que_database linemod/cat \
--que_split linemod_val \
--ref_database linemod/cat \
--ref_split linemod_val
python prepare.py --action gen_val_set \
--estimator_cfg configs/gen6d_train.yaml \
--que_database genmop/tformer-test \
--que_split all \
--ref_database genmop/tformer-ref \
--ref_split all
This command will generate the information in the data/val
, which will be used in producing validation data for the refiner.
7. Train the refiner
python train_model.py --cfg configs/refiner/refiner_train.yaml
- Evaluate all components together.
# Evaluate on the object TFormer from the GenMOP dataset
python eval.py --cfg configs/gen6d_train.yaml --object_name genmop/tformer
# Evaluate on the object cat from the LINEMOD dataset
python eval.py --cfg configs/gen6d_train.yaml --object_name linemod/cat
The process of making the GenMOP dataset is described as follows:
- run SfM on the reference sequence using COLMAP.
- run SfM on the test sequence using COLMAP. Note the test sequence for evaluation need be captured in a static scene.
- Manually label at least 4 keypoints on two images from the query sequence and label the same 4 keypoints on two images from the test sequence.
For example, we label 4 keypoints on the frame40.jpg and frame620.jpg from the reference sequence of the
TFormer
object, which is stored inaling-data/tformer-anno/ref-frame40(620)
. And we label 4 keypoints on the frame130.jpg and frame540.jpg from the test sequence of theTFormer
object, which is stored inaling-data/tformer-anno/test-frane130(540)
. - Compute the alignment poses and scale for two sequences and save the results in
align.pkl
intformer-test/
. We provide an example incompute_align_poses.py
. - With
align.pkl
, you may use theGenMOPDatabase
byparse_database_name('genmop/tformer-que')
andparse_database_name('genmop/tformer-que')
.
We use the annotation tools from https://github.com/luigivieira/Facial-Landmarks-Annotation-Tool to label keypoints for the GenMOP dataset.
In this repository, we have used codes or datasets from the following repositories. We thank all the authors for sharing great codes or datasets.
We provide a paper list about recent generalizable 6-DoF object pose estimators at https://github.com/liuyuan-pal/Awsome-generalizable-6D-object-pose.
@inproceedings{liu2022gen6d,
title={Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images},
author={Liu, Yuan and Wen, Yilin and Peng, Sida and Lin, Cheng and Long, Xiaoxiao and Komura, Taku and Wang, Wenping},
booktitle={ECCV},
year={2022}
}