Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 3D mapping and 2D segmentation without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.
人类天生能够记住永久性元素,而短暂的瞬间往往会从记忆中溜走。这种选择性的记忆保留对于机器人的感知、定位和映射至关重要。为了让机器人具备这种能力,我们引入了三维高斯映射(3DGM),这是一个自监督的、仅使用相机的离线映射框架,基于三维高斯喷溅。3DGM将同一区域的多次穿越RGB视频转换为基于高斯的环境地图,同时执行2D短暂物体分割。我们的关键观察是,环境在多次穿越中保持一致,而物体频繁变化。这使我们能够利用重复穿越的自监督来实现环境与物体的分解。更具体地说,3DGM将多次穿越环境映射问题形式化为一个稳健的可微渲染问题,将环境和物体的像素分别视为内点和外点。通过稳健特征提取、特征残差挖掘和稳健优化,3DGM无需人工干预,即可同时进行3D映射和2D分割。我们构建了Mapverse基准,来源于Ithaca365和nuPlan数据集,以评估我们方法在无监督2D分割、3D重建和神经渲染方面的效果。广泛的结果验证了我们方法在自动驾驶和机器人技术中的有效性和潜力。