Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.33 KB

2412.10373.md

File metadata and controls

7 lines (5 loc) · 2.33 KB

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy. However, they fail to consider the continuity of driving scenarios and ignore the strong prior provided by the evolution of 3D scenes (e.g., only dynamic objects move). In this paper, we propose a world-model-based framework to exploit the scene evolution for perception. We reformulate 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input. We decompose the scene evolution into three factors: 1) ego motion alignment of static scenes; 2) local movements of dynamic objects; and 3) completion of newly-observed scenes. We then employ a Gaussian world model (GaussianWorld) to explicitly exploit these priors and infer the scene evolution in the 3D Gaussian space considering the current RGB observation. We evaluate the effectiveness of our framework on the widely used nuScenes dataset. Our GaussianWorld improves the performance of the single-frame counterpart by over 2% in mIoU without introducing additional computations.

3D占用预测对于自动驾驶至关重要,因为它能够全面感知周围环境。为结合序列输入,目前的大多数方法通过融合先前帧的表示来推断当前的3D占用。然而,这些方法未能考虑驾驶场景的连续性,并忽略了由3D场景演化(例如,仅动态物体会移动)提供的强先验。在本文中,我们提出了一种基于世界模型的框架,用于利用场景演化进行感知。我们将3D占用预测重新表述为一种以当前传感器输入为条件的4D占用预测问题。 我们将场景演化分解为三个因素:1)静态场景的自车运动对齐;2)动态物体的局部运动;3)新观察场景的补全。随后,我们采用高斯世界模型(GaussianWorld),在考虑当前RGB观测的情况下,显式地利用这些先验来推断3D高斯空间中的场景演化。 在广泛使用的 nuScenes 数据集上的实验表明,我们的 GaussianWorld 在不增加额外计算的情况下,将单帧方法的性能(mIoU)提高了2%以上,验证了框架的有效性。