Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion.
The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners for fine-grained contextual correspondence modeling. Subsequently, to dynamically compensate for incomplete observations, we adaptively refine the feature sampling locations based on initially identified locations with high affinity and their neighboring relevant regions. Our method ranks
- News
- Quick Start
- Installation
- Prepare Data
- Pretrained Model
- Training & Evaluation
- License
- Acknowledgements
- [2023/07]: Demo and code released.
- [2023/07]: Paper is on arxiv.
- [2023/07]: Paper is accepted on ECCV 2024.
You can use our pre-picked environment on NVIDIA A100 with the following steps if using the same hardware:
a. Download the pre-picked package: occA100.
b. Unpack environment into directory occA100.
cd /opt/conda/envs/
mkdir -p occA100
tar -xzf occA100.tar.gz -C occA100
c. Activate the environment. This adds occA100/bin to your path.
source occA100/bin/activate
You can also use Python executable file without activating or fixing the prefixes.
./occA100/bin/python
Following https://mmdetection3d.readthedocs.io/en/latest/getting_started.html#installation
a. Create a conda virtual environment and activate it. python > 3.7 may not be supported, because installing open3d-python with py>3.7 causes errors.
conda create -n occupancy python=3.7 -y
conda activate occupancy
b. Install PyTorch and torchvision following the official instructions.
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
c. Install gcc>=5 in conda env (optional). I do not use this step.
conda install -c omgarcia gcc-6 # gcc-6.2
c. Install mmcv-full.
pip install mmcv-full==1.4.0
d. Install mmdet and mmseg.
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
e. Install mmdet3d from source code.
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
f. Install other dependencies.
pip install timm
pip install open3d-python
pip install PyMCubes
The error appears due to the version of "setuptools", try:
pip install setuptools==59.5.0
-
a. You need to download
- The Odometry calibration (Download odometry data set (calibration files)) and the RGB images (Download odometry data set (color)) from KITTI Odometry website, extract them to the folder
data/occupancy/semanticKITTI/RGB/
. - The Velodyne point clouds (Download data_odometry_velodyne) and the SemanticKITTI label data (Download data_odometry_labels) for sparse LIDAR supervision in training process, extract them to the folders
data/lidar/velodyne/
anddata/lidar/lidarseg/
, separately.
- The Odometry calibration (Download odometry data set (calibration files)) and the RGB images (Download odometry data set (color)) from KITTI Odometry website, extract them to the folder
-
b. Prepare KITTI voxel label (see sh file for more details)
bash process_kitti.sh
Download Pretrained model on SemanticKITTI and Efficientnet-b7 pretrained model, put them in the folder ./pretrain
.
- Train with single GPU:
export PYTHONPATH="."
python tools/train.py \
projects/configs/occupancy/semantickitti/temporal_baseline.py
- Evaluate with single GPUs:
export PYTHONPATH="."
bash run_eval_kitti.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py \
pretrain/pretrain.pth
- Train with n GPUs:
bash run.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py n
- Evaluate with n GPUs:
bash tools/dist_test.sh \
projects/configs/occupancy/semantickitti/temporal_baseline.py \
pretrain/pretrain.pth n
This repository is released under the Apache 2.0 license as found in the LICENSE file.
Many thanks to these excellent open source projects:
If you find our paper and code useful for your research, please consider citing:
@article{li2024hierarchical,
title={Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion},
author={Li, Bohan and Deng, Jiajun and Zhang, Wenyao and Liang, Zhujin and Du, Dalong and Jin, Xin and Zeng, Wenjun},
journal={arXiv preprint arXiv:2407.02077},
year={2024}
}