Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry.
呈现来自多视图图像的3D场景仍然是计算机视觉和计算机图形学中的一个核心且长期的挑战。渲染和重建是其中的两个主要需求。值得注意的是,最先进的渲染质量通常通过神经体积渲染技术实现,这些技术依赖于聚合的点/原语的颜色,而忽视了潜在的场景几何结构。神经隐式表面的学习源自神经渲染的成功。当前的工作要么限制密度场的分布,要么限制原语的形状,导致渲染质量下降和学习到的场景表面上的缺陷。这类方法的效果受到所选神经表示固有约束的限制,特别是对于更大、更复杂的场景,难以捕捉细微的表面细节。为了解决这些问题,我们引入了GSDF,一种新颖的双分支架构,结合了灵活高效的3D高斯喷溅(3DGS)表示与神经符号距离场(SDF)的优势。核心思想是利用并增强每个分支的优点,通过相互指导和联合监督来减轻它们的限制。我们在多样化的场景上展示,我们的设计解锁了更准确、更详细的表面重建潜力,并同时让3DGS渲染受益于与潜在几何结构更一致的结构。