Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.
拖拽编辑在2D内容创作中已受到广泛欢迎,这得益于图像生成模型的强大能力。然而,将该技术扩展到3D仍然面临挑战。现有的3D拖拽编辑方法,无论是采用显式空间变换,还是在有限容量的3D生成模型中依赖隐式潜在优化,都难以应对显著的拓扑变化或在多种对象类别中生成新纹理。为克服这些限制,我们提出了MVDrag3D,这是一种更加灵活和创意的拖拽式3D编辑框架,利用多视角生成和重建先验。我们的方法核心在于使用多视角扩散模型作为强大的生成先验,在多个渲染视图上执行一致的拖拽编辑,随后通过重建模型重建被编辑对象的3D高斯表示。尽管初始的3D高斯可能存在视图之间的错位,我们通过视图特定的变形网络来调整高斯的位置以实现良好的对齐。此外,我们提出了一个多视角得分函数,从多个视角中提取生成先验,进一步增强视图一致性和视觉质量。大量实验表明,MVDrag3D提供了一种精确、生成式且灵活的3D拖拽编辑解决方案,支持多种对象类别和3D表示的多样化编辑效果。