Object detection is the most important part of building self-driving cars. This project is designed to leverage the great TensorFlow Object Detection API for object detection specifically in urban environments.
TensorFlow Object Detection API procide us huge zoo of models with easy to use approach.
Our goal with this project is to finetune the pretrained model to detect cars , pedestrians and cyclists in traffic.
The project is structured as follows:
Workspace
│
└───experiments
│ └───reference
│ │ pipeline_new.config # initial model config
│ └───experiment5
│ │ pipeline_new.config #improved model config
└───images
│ └───reference # initial model learning outputs
│ └───experiment5 # final experiment learning outputs
│ └───gif_final # inference video for improved model
│ Exploratory Data Analysis.ipynb #EDA notebook
| Explore augmentations.ipynb #Augmentation notebook
| LICENSE.md
| README.md
| WRITEUP.md
| edit_config.py
| filenames.txt
| inference_video.py
| label_map.pbtxt
| launch_jupyter.sh
| pipeline.config
│ requirements.txt
│ utils.py
The steps to run the code is given in README.md file
Dataset we work is Waymo Open dataset which provides high quality video frames in urban environment. The frames contains object in 3 different classes. Cars, Pedestrians, Cyclists.
in the following image, there is blo annotation box in the upper part of image, which can be considered as weakness of the dataset.
From the simple analisys it is seen that there is significant difference between the amount of car objects vs other 2 classes, which might cause our model be less precise on pedestrian and bicycle detection.
Dataset is split into Train Eval and Test sets in workspace.
The reference model used for this task is SSD Resnet 50 640x640. Which is Singe Shot Detector on Resnet Architecture. However the Resnet architecture is quite well-performing model, the pretrained data is probably different than our dataset. which caused very high amount of loss for training steps and ended up not converging.
The inference video proved that out network was not able to learn to detect objects.
Model was not able to detect any object.
To improve the model I reduced batchsize to 8, and set a learning rate base of 0.001 which is a lot lower than pre-set rate, and initial warmup rate of 0.00033.
At the same time, increased training steps to 3000.
However, the learning rate still does not seem to sattle. But it is significantly more stable than reference model.
But the learning rate would not be enough, from the previous experiments I observed that model severely suffer in the dark videos which are taken in the night trips.
To reduce this effect and make my network to learn better I applied some data augmentation mechanisms, especially the ones that changes color, hue, brightness.
The ide was to simulate different light conditions. And also changing hue, and saturation, i might trick the model to see same cars with different colors.
In fact, it worked, and improved model preformance significantly.
It is very impressive that now model can predict some objects with more than 90% confidence in both day and night recordings. However, there is quite a large space for improvements as the overall confidence is low. And especially the model thinks the water hydrants are pedestrians in the daytime video :D This especially can be caused by the dataset being highly unbalanced. There is significantly more car objects than all other classes. I believe a better results can be achieved by increasing data quality, data varity, and additional data augmentation steps.