Given that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the differences in architecture and training protocols (i.e., objectives, proxy tasks), a unified framework to fairly and comprehensively probe their 3D awareness is urgently needed. Existing works on 3D probing suggest single-view 2.5D estimation (e.g., depth and normal) or two-view sparse 2D correspondence (e.g., matching and tracking). Unfortunately, these tasks ignore texture awareness, and require 3D data as ground-truth, which limits the scale and diversity of their evaluation set. To address these issues, we introduce Feat2GS, which readout 3D Gaussians attributes from VFM features extracted from unposed images. This allows us to probe 3D awareness for geometry and texture via novel view synthesis, without requiring 3D data. Additionally, the disentanglement of 3DGS parameters - geometry (x,α,Σ) and texture (c) - enables separate analysis of texture and geometry awareness. Under Feat2GS, we conduct extensive experiments to probe the 3D awareness of several VFMs, and investigate the ingredients that lead to a 3D aware VFM. Building on these findings, we develop several variants that achieve state-of-the-art across diverse datasets. This makes Feat2GS useful for probing VFMs, and as a simple-yet-effective baseline for novel-view synthesis. Code and data will be made available at this https URL.
视觉基础模型(Visual Foundation Models, VFMs)虽然在大规模数据集上训练,但通常局限于2D图像处理。那么,这些模型对3D世界的理解能力到底如何?由于架构和训练协议(如目标和代理任务)的差异,迫切需要一个统一的框架来公平且全面地探测其3D认知能力。 现有的3D探测方法主要集中于单视图的2.5D估计(如深度和法线)或双视图的稀疏2D对应(如匹配和跟踪)。然而,这些任务忽略了纹理感知,并且依赖于3D数据作为真实标签(ground-truth),从而限制了其评估数据集的规模和多样性。 为解决这些问题,我们提出了 Feat2GS,通过从未标定的图像中提取的 VFM 特征读取3D高斯属性。这使我们能够通过新视角合成来探测几何和纹理的3D认知能力,而无需依赖3D数据。此外,3D高斯投影(3DGS)参数的解耦——几何属性()和纹理属性()——使得可以分别分析模型的几何和纹理认知能力。 基于 Feat2GS,我们进行了大量实验,探测了多个 VFMs 的3D认知能力,并研究了哪些因素有助于构建具备3D认知能力的 VFM。基于这些发现,我们开发了多个变体,在多个数据集上实现了当前最先进的性能。这不仅使 Feat2GS 成为探测 VFM 的有效工具,还作为一种简单但高效的新视角合成基线方法,为3D认知研究提供了新的方向。