This is the implementation of the End-to-End Neural Network (ETENN)
for the Master Thesis Multi-Object Tracking using either Deep Learning or PMBM filtering
by Erik Bohnsack and Adam Lilja
at Chalmers University of Technology.
The implementation is inspired from Fast and Furious: Real Time End-to-End 3D Detection, Tracking and MotionForecasting with a Single Convolutional Net
by Luo et al. and IntentNet: Learning to Predict Intentionfrom Raw Sensor Data
by Casas et al. Since there is no code available
from these two mentioned papers, this could be used in an attempt to replicate or continue to work on the finidngs of the above.
On top of using the Fast and Furious network input processing, we borrowed code from PointPillars:Fast Encoders for Object Detection from Point Clouds
by Lang et al.,
and implemented PointPillars Feature encoding input processing as well.
We have only tested it on Ubuntu 16.04 and Python 3.7, using PyTorch with CUDA.
The network never generalized from training due to two reasons:
- Coordinate Transform mishap
- The input frames and the training labels for the future frames are not transformed to the same coordinate system using the ego motion of the ego vehicle. This simplifies the task for the network, needing less data. It shouldn't be theoretically impossible to learn without this, but we forgot to do it until it was too late.
- Data
- Fast and Furious uses a private dataset which is 2 order of magnitudes bigger than KITTI. For this task the KITTI tracking dataset (without Data Augmentation which we did not have time to implement) was simply not enough.
python 3.7
- torch
- pyyaml
- mayavi
- numba
- visdom [https://github.com/facebookresearch/visdom]
- torchviz [https://github.com/szagoruyko/pytorchviz]
- graphviz:
sudo apt-get install graphviz
Check train.py
or train_pp.py
for the PointPillars
version
Check eval.py
or eval_pp.py
for the PointPillars
version