3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.
3D场景生成在虚拟现实、游戏和电影行业等多个领域有着高度需求。由于文本到图像扩散模型强大的生成能力提供了可靠的先验知识,仅使用文本提示创建3D场景已成为可能,从而显著推进了文本驱动的3D场景生成研究。为了从2D扩散模型获得多视角监督,主流方法通常使用扩散模型生成初始局部图像,然后通过迭代使用扩散模型对局部图像进行外扩来逐步生成场景。然而,这些基于外扩的方法容易产生全局不一致的场景生成结果,且完整度不高,限制了它们的广泛应用。为了解决这些问题,我们提出了HoloDreamer,这是一个首先生成高清全景图作为整个3D场景的整体初始化,然后利用3D高斯喷溅(3D-GS)快速重建3D场景的框架,从而促进创建视角一致且完全封闭的3D场景。具体来说,我们提出了风格化等距矩形全景图生成,这是一个结合多个扩散模型的管道,能够从复杂的文本提示生成风格化和详细的等距矩形全景图。随后,我们引入了增强型两阶段全景图重建,对3D-GS进行两阶段优化,以修补缺失区域并提高场景的完整性。全面的实验表明,在生成完全封闭的场景时,我们的方法在整体视觉一致性和和谐性以及重建质量和渲染稳健性方面优于先前的工作。