In this work, we introduce a generative approach for pose-free reconstruction of 360∘ scenes from a limited number of uncalibrated 2D images. Pose-free scene reconstruction from incomplete, unposed observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of unbounded scenes with known camera poses using diffusion priors, these methods rely on explicit camera embeddings for extrapolating unobserved regions. This reliance limits their application in pose-free settings, where view-specific data is only implicitly available. To address this, we propose an instruction-following RGBD diffusion model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We also propose a novel confidence measure for Gaussian representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent Gaussian representation. Evaluations on the MipNeRF360 dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed reconstruction methods in complex 360∘scenes.
在本研究中,我们提出了一种生成式方法,用于从有限数量的未校准 2D 图像中进行无位姿 360∘ 场景重建。无位姿场景重建在面对不完整、未配准的观测时,通常通过深度估计或 3D 基础先验进行正则化。尽管最近的研究已经实现了在已知相机位姿条件下使用扩散先验进行稀疏视角的无界场景重建,这些方法依赖显式相机嵌入来外推未观测区域。这种依赖性限制了它们在无位姿环境中的应用,因为视角特定的数据仅以隐式方式可用。 为了解决这一问题,我们提出了一种指令跟随的 RGBD 扩散模型,用于对 3D 场景的新视角渲染和深度图中的缺失细节进行修复,并去除伪影。此外,我们设计了一种针对高斯表示的新颖置信度度量方法,用于更好地检测这些伪影。通过将这些新视角渐进式地集成到一个受 Gaussian-SLAM 启发的流程中,我们实现了多视角一致的高斯表示。 在 MipNeRF360 数据集上的评估结果表明,我们的方法在复杂的 360∘ 场景中超越了现有的无位姿技术,并且在性能上与最先进的有位姿重建方法相媲美。