Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.42 KB

2406.15333.md

File metadata and controls

5 lines (3 loc) · 2.42 KB

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications.

在这项工作中,我们引入了几何感知大型重建模型(GeoLRM),这是一种能够仅使用11GB GPU内存从21张输入图像中预测具有512k高斯的高质量资产的方法。以前的工作忽视了3D结构的固有稀疏性,并没有利用3D与2D图像之间的显式几何关系。这限制了这些方法到低分辨率的表示,并使得难以扩展到密集视图以获得更好的质量。GeoLRM通过整合一种新颖的3D感知变换器结构来解决这些问题,该结构直接处理3D点,并使用可变形交叉注意机制有效地将图像特征整合到3D表示中。我们通过两阶段管道实现了这一解决方案:最初,一个轻量级的提议网络从定位图像输入生成一组稀疏的3D锚点;随后,一个专门的重建变换器细化几何并检索纹理细节。广泛的实验结果表明,GeoLRM显著优于现有模型,特别是对于密集视图输入。我们还展示了我们模型在3D生成任务中的实际应用性,展示了其多功能性和在真实世界应用中更广泛采用的潜力。