Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. Overall, our method can create lifelike avatars with dynamic, realistic and generalized appearances. Experiments show that our method outperforms other state-of-the-art approaches.
从RGB视频中建模可动画的人类化身是一个长期存在且充满挑战的问题。近期的作品通常采用基于MLP的神经辐射场(NeRF)来表示3D人类,但对于纯MLP来说,回归姿势依赖的服装细节仍然是困难的。为此,我们引入了可动画高斯体,这是一种新的化身表示方法,它利用强大的2D CNNs和3D高斯飞溅来创建高保真化身。为了将3D高斯与可动画化身关联起来,我们从输入视频中学习一个参数化模板,然后在两个前后标准的高斯图上对模板进行参数化,其中每个像素代表一个3D高斯。所学习的模板能够适应所穿着的服装,以模拟像连衣裙这样的宽松衣服。这种模板引导的2D参数化使我们能够采用强大的基于StyleGAN的CNN来学习姿势依赖的高斯图,以模拟详细的动态外观。此外,我们引入了一种姿势投影策略,以便在给定新姿势时更好地泛化。总的来说,我们的方法可以创建逼真的、动态的、真实的且泛化性强的化身。实验表明,我们的方法优于其他最先进的方法。