With the advent of portable 360° cameras, panorama has gained significant attention in applications like virtual reality (VR), virtual tours, robotics, and autonomous driving. As a result, wide-baseline panorama view synthesis has emerged as a vital task, where high resolution, fast inference, and memory efficiency are essential. Nevertheless, existing methods are typically constrained to lower resolutions (512 × 1024) due to demanding memory and computational requirements. In this paper, we present PanSplat, a generalizable, feed-forward approach that efficiently supports resolution up to 4K (2048 × 4096). Our approach features a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement, enhancing image quality while reducing information redundancy. To accommodate the demands of high resolution, we propose a pipeline that integrates a hierarchical spherical cost volume and Gaussian heads with local operations, enabling two-step deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments demonstrate that PanSplat achieves state-of-the-art results with superior efficiency and image quality across both synthetic and real-world datasets.
随着便携式 360° 相机的普及,全景图在虚拟现实(VR)、虚拟旅游、机器人和自动驾驶等应用中引起了广泛关注。因此,宽基线全景视图合成成为了一项重要任务,其中高分辨率、快速推理和内存效率至关重要。然而,现有方法通常受限于较低分辨率(512 × 1024),原因在于高昂的内存和计算需求。 本文提出了 PanSplat,一种通用的前馈式方法,可高效支持高达 4K(2048 × 4096)分辨率。我们的方法采用了专门设计的球面三维高斯金字塔,并基于 Fibonacci 格点排列,以提升图像质量同时减少信息冗余。为满足高分辨率的需求,我们设计了一种集成分层球面代价体积和局部操作高斯头的流程,通过两步延迟反向传播实现单张 A100 GPU 上的内存高效训练。 实验表明,PanSplat 在合成和真实世界数据集上均取得了当前最先进的结果,不仅具备优越的效率,还显著提高了图像质量。