The online reconstruction of dynamic scenes from multi-view streaming videos faces significant challenges in training, rendering and storage efficiency. Harnessing superior learning speed and real-time rendering capabilities, 3D Gaussian Splatting (3DGS) has recently demonstrated considerable potential in this field. However, 3DGS can be inefficient in terms of storage and prone to overfitting by excessively growing Gaussians, particularly with limited views. This paper proposes an efficient framework, dubbed HiCoM, with three key components. First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy. Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians to swiftly and accurately learn motions across frames. Finally, we continually refine the 3DGS with additional Gaussians, which are later merged into the initial 3DGS to maintain consistency with the evolving scene. To preserve a compact representation, an equivalent number of low-opacity Gaussians that minimally impact the representation are removed before processing subsequent frames. Extensive experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about 20% and reduces the data storage by 85%, achieving competitive free-viewpoint video synthesis quality but with higher robustness and stability. Moreover, by parallel learning multiple frames simultaneously, our HiCoM decreases the average training wall time to <2 seconds per frame with negligible performance degradation, substantially boosting real-world applicability and responsiveness.
从多视图流媒体视频中在线重建动态场景面临着训练、渲染和存储效率方面的显著挑战。3D Gaussian Splatting (3DGS) 以其卓越的学习速度和实时渲染能力,近年来在该领域展现了巨大潜力。然而,3DGS 在存储效率方面存在不足,且在视图有限的情况下容易通过过多增长的高斯基元导致过拟合。为了解决这些问题,本文提出了一个高效框架 HiCoM,包含三个关键组件。 首先,我们采用扰动平滑策略构建紧凑且鲁棒的初始 3DGS 表示。接着,我们引入了一种 分层一致运动机制 (Hierarchical Coherent Motion),利用 3D 高斯的非均匀分布和局部一致性,快速准确地学习帧间运动。最后,我们通过额外高斯对 3DGS 进行持续优化,并将这些高斯合并到初始 3DGS 中,以保持与动态变化场景的一致性。为维持紧凑表示,在处理后续帧前,移除等量的低不透明度高斯基元,这些基元对表示的影响最小。 在两个广泛使用的数据集上进行的大量实验表明,该框架在学习效率上比现有最先进方法提升约 20%,存储需求减少 85%,在自由视点视频合成质量上达到了具有竞争力的效果,同时表现出更高的鲁棒性和稳定性。此外,通过并行学习多个帧,HiCoM 将平均训练时间减少至每帧小于 2 秒,性能几乎没有下降,显著提升了其在实际应用中的响应速度和适用性。