Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.48 KB

2410.08282.md

File metadata and controls

5 lines (3 loc) · 2.48 KB

FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.

人类能够轻松地将常识知识与视觉和触觉的感官输入相结合,以理解周围环境。为了模拟这一能力,我们提出了FusionSense,这是一种新颖的3D重建框架,使机器人能够将基础模型中的先验知识与来自视觉和触觉传感器的高度稀疏观测数据相融合。FusionSense解决了三个关键挑战:(i) 机器人如何高效获取关于周围场景和物体的全局形状信息?(ii) 机器人如何利用几何和常识先验策略性地选择物体上的触点?(iii) 像触觉信号这样的部分观测如何改善物体的整体表示?我们的框架采用3D高斯点云作为核心表示,结合了一种分层优化策略,涉及全局结构构建、物体可视外壳剪裁以及局部几何约束。该进展在传统上具有挑战性的环境中(例如透明、反光或暗色的物体)实现了快速且鲁棒的感知,促进了更多后续操作或导航任务。在真实世界数据上的实验表明,我们的框架在性能上优于之前的稀疏视角方法。所有代码和数据均已在项目网站上开源。