This is an experimental Tensorflow implementation of MV3D - a ConvNet for object detection with Lidar and Mono-camera.
For details about MV3D please refer to the paper Multi-View 3D Object Detection Network for Autonomous Driving by Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia.
-
Requirements for Tensorflow 1.0 (see: Tensorflow)
-
Python packages you might not have:
cython
,python-opencv
,easydict
- For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)
- Clone the Faster R-CNN repository
# Make sure to clone with --recursive
git clone --recursive https://github.com/RyannnG/MV3D_TF.git
-
Build the Cython modules
cd $MV3D/lib make
-
Downloads KITTI object datasets.
% Specify KITTI data path so that the structure is like
% {kitti_dir}/object/training/image_2
% /image_3
% /calib
% /lidar_bv
% /velodyne
% {kitti_dir}/object/testing/image_2
% /image_3
% /calib
% /lidar_bv
% /velodyne
-
Make Lidar Bird View data
# edit the kitti_path in tools/read_lidar.py # then start make data python tools/read_lidar.py
-
Create symlinks for the KITTI dataset
cd $MV3D/data/KITTI
ln -s {kitti_dir}/object object
-
Download pre-trained ImageNet models
Download the pre-trained ImageNet models [Google Drive] [Dropbox]
mv VGG_imagenet.npy $MV3D/data/pretrain_model/VGG_imagenet.npy
- Run script to train model
cd $MV3D
./experiments/scripts/mv3d.sh $DEVICE $DEVICE_ID ${.npy/ckpt.meta} kitti_train
DEVICE is either cpu/gpu
Key idea: Use Lidar bird view to generate anchor boxes, then project those boxes on image to do classification.
Image and corresponding Lidar map
Note:
In image:
- Boxes without regression
In Lidar:
- white box: without regression (correspond with image)
- purple box: with regression
-
Mostly due to regression error
(error in box 5,6,9)
(error in 8, 9, 10)
part.2: Didi Udacity Challenge 2017 — Car and pedestrian Detection using Lidar and RGB