Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.67 KB

2410.04680.md

File metadata and controls

5 lines (3 loc) · 2.67 KB

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes.

我们提出了一个框架,基于3D高斯点(3DGS)为机器人操作臂选择下一个最佳视图和触摸位置。3DGS作为一种有前景的显式3D场景表示方式,正在机器人领域中崭露头角,因为它能够以照片级真实感和几何精度表示场景。然而,在实际的在线机器人场景中,由于效率要求,视角数量有限,随机视角选择对于3DGS变得不切实际,因为这些视角通常会重叠且冗余。我们通过提出一个端到端的在线训练和主动视角选择流程解决了这一问题,提升了3DGS在少视角机器人环境下的表现。我们首先通过使用Segment Anything Model 2(SAM2)的新颖语义深度对齐方法,结合皮尔逊深度和表面法线损失,提升了真实世界场景中色彩和深度的重建效果,从而提高了少样本3DGS的表现。接着,我们扩展了FisherRF这一用于3DGS的下一步最佳视角选择方法,基于深度不确定性选择视角和触摸姿态。在实际机器人系统中,我们在3DGS在线训练期间执行了在线视角选择。我们为少样本高斯场景的改进提供了动机,并将基于深度的FisherRF扩展应用于这些场景,展示了在具有挑战性的机器人场景中定性和定量的改进效果。