This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose the first animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. Our body is represented by a set of gaussian primitives in a canonical space which are deformed in a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (OURS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method presents a PSNR 1.5dbB better than the state-of-the-art on THuman4 dataset while being able to render at 20fps or more.
本工作解决了从多视角视频中学习的真实感人体虚拟形象的实时渲染问题。尽管传统的虚拟人类建模和渲染方法通常使用纹理网格,但近期的研究已经开发出了神经体表示,实现了令人印象深刻的视觉质量。然而,这些模型难以实时渲染,并且当角色的身体姿势与训练观测不同时,其质量会降低。我们提出了基于3D高斯喷溅(Gaussian Splatting)的第一个可动画人类模型,这是一种非常高效的替代神经辐射场的新方法。我们的身体由一组高斯原始体在规范空间中表示,并通过结合前向蒙皮和局部非刚性精细化的方法进行粗到细的变形。我们描述了如何从多视角观察中端到端地学习我们的人类高斯喷溅(OURS)模型,并将其与最新技术方法在新姿势合成有衣体方面进行评估。我们的方法在THuman4数据集上比现有技术高出1.5dbB的PSNR,同时能够以20fps或更高速度渲染。