MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction
Real-time multi-agent collaboration for ego-motion estimation and high-fidelity 3D reconstruction is vital for scalable spatial intelligence. However, traditional methods produce sparse, low-detail maps, while recent dense mapping approaches struggle with high latency. To overcome these challenges, we present MAC-Ego3D, a novel framework for real-time collaborative photorealistic 3D reconstruction via Multi-Agent Gaussian Consensus. MAC-Ego3D enables agents to independently construct, align, and iteratively refine local maps using a unified Gaussian splat representation. Through Intra-Agent Gaussian Consensus, it enforces spatial coherence among neighboring Gaussian splats within an agent. For global alignment, parallelized Inter-Agent Gaussian Consensus, which asynchronously aligns and optimizes local maps by regularizing multi-agent Gaussian splats, seamlessly integrates them into a high-fidelity 3D model. Leveraging Gaussian primitives, MAC-Ego3D supports efficient RGB-D rendering, enabling rapid inter-agent Gaussian association and alignment. MAC-Ego3D bridges local precision and global coherence, delivering higher efficiency, largely reducing localization error, and improving mapping fidelity. It establishes a new SOTA on synthetic and real-world benchmarks, achieving a 15x increase in inference speed, order-of-magnitude reductions in ego-motion estimation error for partial cases, and RGB PSNR gains of 4 to 10 dB.
实时多智能体协作进行自运动估计和高保真三维重建是实现可扩展空间智能的关键。然而,传统方法通常生成稀疏、低细节的地图,而最近的密集映射方法则面临高延迟问题。为了解决这些挑战,我们提出了 MAC-Ego3D,一种通过多智能体高斯共识实现实时协作光真实感三维重建的新框架。 MAC-Ego3D 使智能体能够独立构建、对齐并通过统一的高斯点云表示迭代优化本地地图。通过智能体内高斯共识(Intra-Agent Gaussian Consensus),框架在单个智能体内的邻近高斯点云之间强制保持空间一致性。对于全局对齐,框架采用并行的智能体间高斯共识(Inter-Agent Gaussian Consensus),异步对齐并优化本地地图,通过对多智能体高斯点云的正则化,流畅地将其整合为高保真的三维模型。 借助高斯基元,MAC-Ego3D 支持高效的 RGB-D 渲染,实现快速的智能体间高斯关联和对齐。框架在局部精度与全局一致性之间架起了桥梁,不仅显著提升效率,极大地降低了定位误差,还提高了映射的保真度。 MAC-Ego3D 在合成和真实世界基准测试中设立了新的性能标杆,实现了 15 倍的推理速度提升,自运动估计误差在部分场景中降低了一个数量级,并且 RGB 的 PSNR 提升了 4 至 10 dB。