Novel view synthesis of dynamic scenes is becoming important in various applications, including augmented and virtual reality. We propose a novel 4D Gaussian Splatting (4DGS) algorithm for dynamic scenes from casually recorded monocular videos. To overcome the overfitting problem of existing work for these real-world videos, we introduce an uncertainty-aware regularization that identifies uncertain regions with few observations and selectively imposes additional priors based on diffusion models and depth smoothness on such regions. This approach improves both the performance of novel view synthesis and the quality of training image reconstruction. We also identify the initialization problem of 4DGS in fast-moving dynamic regions, where the Structure from Motion (SfM) algorithm fails to provide reliable 3D landmarks. To initialize Gaussian primitives in such regions, we present a dynamic region densification method using the estimated depth maps and scene flow. Our experiments show that the proposed method improves the performance of 4DGS reconstruction from a video captured by a handheld monocular camera and also exhibits promising results in few-shot static scene reconstruction.
动态场景的新视图合成在增强现实和虚拟现实等应用中变得越来越重要。我们提出了一种新颖的 4D Gaussian Splatting (4DGS) 算法,用于从随意录制的单目视频中生成动态场景。为克服现有方法在处理真实世界视频时的过拟合问题,我们引入了一种不确定性感知正则化,该方法识别观测较少的高不确定性区域,并在这些区域选择性地施加基于扩散模型和深度平滑性的附加先验。此方法提升了新视图合成的性能,同时改善了训练图像重建的质量。 此外,我们还识别了 4DGS 在快速移动的动态区域中初始化的难点。在这些区域,结构化运动(Structure from Motion, SfM)算法无法提供可靠的 3D 特征点。为解决这一问题,我们提出了一种基于估计深度图和场景流的动态区域致密化方法,用于初始化这些区域中的高斯基元。 实验结果表明,该方法显著提升了从手持单目相机录制视频中进行 4DGS 重建的性能,同时在少样本静态场景重建中也表现出了令人期待的效果。