DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

Recent advancements in 2D/3D generative techniques have facilitated the generation of dynamic 3D objects from monocular videos. Previous methods mainly rely on the implicit neural radiance fields (NeRF) or explicit Gaussian Splatting as the underlying representation, and struggle to achieve satisfactory spatial-temporal consistency and surface appearance. Drawing inspiration from modern 3D animation pipelines, we introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video. Instead of utilizing classical texture map for appearance, we bind Gaussian splats to triangle face of mesh for differentiable optimization of both the texture and mesh vertices. In particular, DreamMesh4D begins with a coarse mesh obtained through an image-to-3D generation procedure. Sparse points are then uniformly sampled across the mesh surface, and are used to build a deformation graph to drive the motion of the 3D object for the sake of computational efficiency and providing additional constraint. For each step, transformations of sparse control points are predicted using a deformation network, and the mesh vertices as well as the surface Gaussians are deformed via a novel geometric skinning algorithm, which is a hybrid approach combining LBS (linear blending skinning) and DQS (dual-quaternion skinning), mitigating drawbacks associated with both approaches. The static surface Gaussians and mesh vertices as well as the deformation network are learned via reference view photometric loss, score distillation loss as well as other regularizers in a two-stage manner. Extensive experiments demonstrate superior performance of our method. Furthermore, our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.

最近2D/3D生成技术的进展大大促进了从单目视频生成动态3D对象。以往的方法主要依赖隐式神经辐射场（NeRF）或显式高斯点作为底层表示，但在实现时空一致性和表面外观上仍存在困难。借鉴现代3D动画管线的灵感，我们提出了DreamMesh4D，一个结合网格表示与几何蒙皮技术的框架，用于从单目视频生成高质量的4D对象。与传统的纹理映射不同，我们将高斯点绑定到网格的三角面上，以便对纹理和网格顶点进行可微分优化。 DreamMesh4D从通过图像到3D生成过程得到的粗糙网格开始。然后在网格表面均匀采样稀疏点，用于构建形变图，以驱动3D对象的运动，既提高了计算效率，也提供了额外的约束。在每一步中，通过形变网络预测稀疏控制点的变换，并使用结合LBS（线性混合蒙皮）和DQS（双四元数蒙皮）的新型几何蒙皮算法，对网格顶点和表面高斯点进行形变，从而减轻两种方法的缺陷。静态表面高斯点和网格顶点，以及形变网络的学习，通过参考视图的光度损失、得分蒸馏损失以及其他正则项以两阶段的方式进行。大量实验表明我们的方法具有优越的性能。此外，我们的方法与现代图形管线兼容，展现了其在3D游戏和电影产业中的潜力。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2410.06756.md

2410.06756.md

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

Files

2410.06756.md

Latest commit

History

2410.06756.md

File metadata and controls

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation