This project aims to improve Visual teach and repeat navigation (VT&R) systems' performance in illumination changes using one Siamese-CNN trained in a contrastive learning manner which fuses infrared and RGB images at the decision and feature levels. It is based on the pervious work in (VT&R) system which aims to find the horizontal displacement between prerecorded and currently perceived images required to steer a robot towards the previously traversed path[1].
Build dataset
: According to wheel odometry, IR and RGB image pairs are aligned and extracted at a regular distance in rosbags. Rectify these image pairs and store them as dataset. Below is one example:
path0: 11 videos and 18,490 images(Including IR and RGB images) path1: 7 videos and 17700 images(Including IR and RGB images) path2: 4 videos and 9028 images(Including IR and RGB images)
- Single image pipeline
Cut out embedded images and pad reference images -> Slide the embedded images through reference images and calculate similarity using torch.nn.functional.Conv2d -> Normalization -> Calculate likelihood using softmax -> Find most likely position -> Evaluate the distance between ground truth and given output by Absolute Error.
- Fusion image pipeline
Take same embedded RGB and IR images -> single image processing seperately -> dot multiply two likelihood arrays -> Find most likely position -> Evaluate the distance between ground truth and given output by Absolute Error.
-
Combine infrared image and RGB image into one four channel image (Red, Green, Blue, Infrared).
-
Train one Siamse-CNN for combined imgae input.
- Absoulte error: Output displacement - ground truth
- Standard deviation of AE.
At the decision level, the improvement is unnoticeable, but the fusion results are obust to extreme environmental changes. Experiment result of path0.
One detailed result, when there is sun glare at daylight shows below: