teaser.mp4
This repository implements the training and testing tools for Free3D by Chuanxia Zheng and Andrea Vedaldi in VGG at the University of Oxford. Given a single-view image, the proposed Free3D synthesizes correct novel views without the need of an explicit 3D representation.
# create the environment
conda create --name free3d python=3.9
conda activate free3d
# install the pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# other dependencies
pip install -r requirements.txt
- Objaverse: For training / evaluating on Objaverse (7,729 instances for testing), please download the rendered dataset from zero-1-to-3. The original command they provided is:
Unzip the data file and change
wget https://tri-ml-public.s3.amazonaws.com/datasets/views_release.tar.gz
root_dir
inconfigs/objaverse.yaml
. - OmniObject3D: For evaluating on OmniObject3d (5,275 instances), please refer to OmniObject3D Github, and change
root_dir
inconfigs/omniobject3d
. Since we do not train the model on this dataset, we directly evaluate on the training set. - GSO: For evaluating on Google Scanned Objects (GSO, 1,030 instances), please download the whole 3D models, and use the rendered code from zero-1-to-3 to get 25 views for each scene. Then, change
root_dir
inconfigs/googlescan.yaml
to the corresponding location. Our rendered files are available on Google Drive.
- batch testing for quantitative results
python batch_test.py \ --resume [model directory path] \ --config [configs/*.yaml] \ --save_path [save directory path] \
- single image testing for qualitative results
# for real examples, please download the segment anything checkpoint wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth # run the single image test command python test.py \ --resume [model directory path] \ --sam_path [sam checkpoint path] \ --img_path [image path] \ --gen_type ['image' or 'video'] \ --save_path [save directory path]
- the general metrics are evaluated with:
cd evaluations python evaluation.py --gt_path [ground truth images path] --g_path [generated NVS images path]
- The Ray Conditioning Normalization (RCN) to enhance the pose accuracy is trained with the following command:
# download image-conditional stable diffusion checkpoint released by lambda labs # this training takes around 9 days on 4x a6000 (48G) wget https://cv.cs.columbia.edu/zero123/assets/sd-image-conditioned-v2.ckpt # or download checkpoint released by zero-1-t-3 # this training takes around 2 days on 4x 60000 (48G) wget https://cv.cs.columbia.edu/zero123/assets/105000.ckpt # change the finetune_from in train.sh, and run the command sh train.sh
- The pseudo-3D attention to smooth the consistency is trained with the same command (1 day on 4x A6000), but with different parameters:
# modify the configs/objaverse.yaml as follows views: 4 use_3d_transformer: True # modify the finetune_from in train.sh to you first stage model finetune_from [RCN trained model]
- RCN w/o pseudo 3D attention model is available at huggingface.
- Stable Video Diffusion fine-tunes image-to-video diffusion model for multi-view generations.
- Efficient-3DiM fine-tunes the stable diffusion model with a stronger vision transformer DINO v2.
- Consistent-1-to-3 applies the epipolar-attention to extract coarse results for the diffusion model.
- One-2-3-45 and One-2-3-45++ train additional 3D network using the outputs of 2D generator.
- MVDream, Consistent123 and Wonder3D also train multi-view diffusion models, yet still require post-processing for video rendering.
- SyncDreamer and ConsistNet employ 3D representation into the latent diffusion model.
If you find our code helpful, please cite our paper:
@article{zheng2023free3D,
author = {Zheng, Chuanxia and Vedaldi, Andrea},
title = {Free3D: Consistent Novel View Synthesis without 3D Representation},
journal = {arXiv},
year = {2023},
Many thanks to Stanislaw Szymanowicz, Edgar Sucar, and Luke Melas-Kyriazi of VGG for insightful discussions and Ruining Li, Eldar Insafutdinov, and Yash Bhalgat of VGG for their helpful feedback. We would also like to thank the authors of Zero-1-to-3 and Objaverse-XL for their helpful discussions.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.