Place recognition is a crucial module to ensure autonomous vehicles obtain usable localization information in GPS-denied environments. In recent years, multimodal place recognition methods have gained increasing attention due to their ability to overcome the weaknesses of unimodal sensor systems by leveraging complementary information from different modalities. However, challenges arise from the necessity of harmonizing data across modalities and exploiting the spatio-temporal correlations between them sufficiently. In this paper, we propose a 3D Gaussian Splatting-based multimodal place recognition neural network dubbed GSPR. It explicitly combines multi-view RGB images and LiDAR point clouds into a spatio-temporally unified scene representation with the proposed Multimodal Gaussian Splatting. A network composed of 3D graph convolution and transformer is designed to extract high-level spatio-temporal features and global descriptors from the Gaussian scenes for place recognition. We evaluate our method on the nuScenes dataset, and the experimental results demonstrate that our method can effectively leverage complementary strengths of both multi-view cameras and LiDAR, achieving SOTA place recognition performance while maintaining solid generalization ability.
地点识别是确保自动驾驶车辆在无GPS环境中获取可用定位信息的关键模块。近年来,多模态地点识别方法由于能够利用不同模态的互补信息,克服单一传感器系统的弱点,得到了越来越多的关注。然而,挑战在于如何有效协调跨模态的数据,并充分利用它们之间的时空相关性。在本文中,我们提出了一种基于3D高斯分布(3D Gaussian Splatting)的多模态地点识别神经网络,称为GSPR。该方法通过提出的多模态高斯分布,明确地将多视角RGB图像和LiDAR点云结合为一个时空统一的场景表示。我们设计了一个由3D图卷积和Transformer组成的网络,用于从高斯场景中提取高级时空特征和全局描述符,以实现地点识别。我们在nuScenes数据集上对该方法进行了评估,实验结果表明,我们的方法能够有效利用多视角相机和LiDAR的互补优势,达到了当前最先进(SOTA)的地点识别性能,同时保持了良好的泛化能力。