Forecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. To this end, we first propose a 3D Gaussian canonical space with deformation modeling to capture the appearance and geometry of dynamic scenes, and integrate the lifecycle property into Gaussians for irreversible deformations. To make the prediction feasible and efficient, a concentric motion distillation approach is developed by distilling the scene motion with key points. Finally, a Graph Convolutional Network is employed to predict the motions of key points, enabling the rendering of photorealistic images of future scenarios. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
在动态环境中预测未来场景对于智能决策和导航至关重要,这是计算机视觉和机器人技术尚未完全实现的挑战。传统方法如视频预测和新视角合成要么缺乏从任意视点预测的能力,要么无法预测时间动态。在本文中,我们引入了GaussianPrediction,这是一个新颖的框架,它使3D高斯表示能够对动态环境中的动态场景进行建模和未来场景合成。GaussianPrediction可以使用动态场景的视频观测数据从任何视点预测未来状态。为此,我们首先提出了一个具有形变建模的3D高斯典型空间,以捕捉动态场景的外观和几何,并将生命周期属性整合到高斯体中以处理不可逆形变。为了使预测可行且高效,我们开发了一种同心运动提炼方法,通过关键点提炼场景运动。最后,使用图卷积网络预测关键点的运动,使得能够渲染未来场景的逼真图像。我们的框架在合成和现实世界数据集上表现出色,证明了其在预测和渲染未来环境方面的有效性。