Project Page | Paper | Data
Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and nuScenes benchmarks.
- Clone the repo using the
--recursive
flag
git clone --recurse-submodules https://github.com/AronDiSc/r3d3.git
cd r3d3
- Creating a new anaconda environment using the provided .yaml file
conda env create --file environment.yaml
conda activate r3d3
- Compile the extensions (takes about 10 minutes)
python setup.py install
The datasets should be placed at data/datasets/<dataset>
Download the DDAD dataset and place it at
data/datasets/DDAD
. We use the masks provided by
SurroundDepth. Place them at data/datasets/DDAD/<scene>/occl_mask/<cam>/mask.png
. The DDAD datastructure should look
as follows:
R3D3
├ data
├ datasets
├ DDAD
├ <scene>
├ calibration
└ ....json
├ point_cloud
└ <cam>
└ ....npz
├ occl_mask
└ <cam>
└ ....png
├ rgb
└ <cam>
└ ....png
└ scene_....json
└ ...
└ ...
└ ...
└ ...
Download the nuScenes dataset and place it at
data/datasets/nuScenes
. We use the provide self-occlusion
masks. Place them at
data/datasets/nuScenes/mask/<cam>.png
. The nuScenes datastructure should look as follows:
R3D3
├ data
├ datasets
├ nuScenes
├ mask
├ CAM_....png
├ samples
├ CAM_...
└ ....jpg
└ LIDAR_TOP
└ ....pcd.bin
├ sweeps
├ CAM_...
└ ....jpg
├ v1.0-trainval
└ ...
└ ...
└ ...
└ ...
└ ...
Download the weights for the feature- and context-encoders as well as the GRU from here: r3d3_finetuned.ckpt. Place it at:
R3D3
├ data
├ models
├ r3d3
└ r3d3_finetuned.ckpt
└ ...
└ ...
└ ...
We provide completion network weights for the DDAD and nuScenes datasets.
Dataset | Abs Rel | Sq Rel | RMSE | delta < 1.25 | Download |
---|---|---|---|---|---|
DDAD | 0.162 | 3.019 | 11.408 | 0.811 | completion_ddad.ckpt |
nuScenes | 0.253 | 4.759 | 7.150 | 0.729 | completion_nuscenes.ckpt |
Place them at:
R3D3
├ data
├ models
├ completion
├ completion_ddad.ckpt
└ completion_nuscenes.ckpt
└ ...
└ ...
└ ...
We finetune the provided droid.pth checkpoint on VKITTI2 by using the Droid-SLAM code-base.
# DDAD
python evaluate.py \
--config configs/evaluation/dataset_generation/dataset_generation_ddad.yaml \
--r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 384 640 \
--r3d3_n_warmup=5 \
--r3d3_optm_window=5 \
--r3d3_corr_impl=lowmem \
--r3d3_graph_type=droid_slam \
--training_data_path=./data/datasets/DDAD
# nuScenes
python evaluate.py \
--config configs/evaluation/dataset_generation/dataset_generation_nuscenes.yaml \
--r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 448 768 \
--r3d3_n_warmup=5 \
--r3d3_optm_window=5 \
--r3d3_corr_impl=lowmem \
--r3d3_graph_type=droid_slam \
--training_data_path=./data/datasets/nuScenes
# DDAD
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_ddad_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
# nuScenes
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_nuscenes_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
# DDAD
python evaluate.py \
--config configs/evaluation/r3d3/r3d3_evaluation_ddad.yaml \
--r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 384 640 \
--r3d3_init_motion_only \
--r3d3_n_edges_max=84
# nuScenes
python evaluate.py \
--config configs/evaluation/r3d3/r3d3_evaluation_nuscenes.yaml \
--r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
--r3d3_image_size 448 768 \
--r3d3_init_motion_only \
--r3d3_dt_inter=0 \
--r3d3_n_edges_max=72
If you find the code helpful in your research or work, please cite the following paper.
@inproceedings{r3d3,
title={R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras},
author={Schmied, Aron and Fischer, Tobias and Danelljan, Martin and Pollefeys, Marc and Yu, Fisher},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2023}
}
- This repository is based on Droid-SLAM.
- The implementation of the completion network is based on Monodepth2.
- The vidar framework is used for training, evaluation and logging results.