3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on Gaussian splats. SplatFormer takes as input an initial 3DGS set optimized under limited training views and refines it in a single forward pass, effectively removing potential artifacts in OOD test views. To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference. Our model significantly improves rendering quality under extreme novel views, achieving state-of-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.
3D Gaussian Splatting (3DGS) 最近在高真实感重建领域取得了突破,兼具高视觉保真度和实时性能。然而,当测试视图偏离训练时使用的摄像机角度时,渲染质量会显著下降,这对沉浸式自由视点渲染和导航等应用构成了重大挑战。在本研究中,我们对 3DGS 及相关的新视图合成方法在分布外(Out-of-Distribution, OOD)测试摄像机场景下进行了全面评估。通过在合成和真实数据集上创建多样化的测试案例,我们发现,包括采用各种正则化技术和数据驱动先验的现有方法在内,大多数方法在应对 OOD 视图时仍然难以实现有效泛化。 为了解决这一局限性,我们提出了 SplatFormer,这是首个专为高斯投影点设计的点变换器模型。SplatFormer 以有限训练视图优化的初始 3DGS 集作为输入,并在单次前向传递中对其进行精化,有效消除了 OOD 测试视图中的潜在伪影。据我们所知,这也是点变换器首次成功应用于 3DGS 集,突破了先前多场景训练方法的局限性,这些方法在推理期间只能处理有限数量的输入视图。 我们的模型显著提升了极端新视图下的渲染质量,在这些具有挑战性的场景中实现了最先进的性能,并超越了各种 3DGS 正则化技术、多场景稀疏视图合成模型以及基于扩散框架的方法。