Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
datasets		datasets
evaluations		evaluations
examples/real		examples/real
models		models
modules		modules
utils		utils
License		License
README.md		README.md
batch_test.py		batch_test.py
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py
train.sh		train.sh

Repository files navigation

Free3D

[arXiv] [Project] [BibTeX]

Teaser example

teaser.mp4

This repository implements the training and testing tools for Free3D by Chuanxia Zheng and Andrea Vedaldi in VGG at the University of Oxford. Given a single-view image, the proposed Free3D synthesizes correct novel views without the need of an explicit 3D representation.

Usage

Installation

# create the environment
conda create --name free3d python=3.9
conda activate free3d
# install the pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# other dependencies
pip install -r requirements.txt

Datasets

Objaverse: For training / evaluating on Objaverse (7,729 instances for testing), please download the rendered dataset from zero-1-to-3. The original command they provided is:
```
wget https://tri-ml-public.s3.amazonaws.com/datasets/views_release.tar.gz
```
Unzip the data file and change root_dir in configs/objaverse.yaml.
OmniObject3D: For evaluating on OmniObject3d (5,275 instances), please refer to OmniObject3D Github, and change root_dir in configs/omniobject3d. Since we do not train the model on this dataset, we directly evaluate on the training set.
GSO: For evaluating on Google Scanned Objects (GSO, 1,030 instances), please download the whole 3D models, and use the rendered code from zero-1-to-3 to get 25 views for each scene. Then, change root_dir in configs/googlescan.yaml to the corresponding location. Our rendered files are available on Google Drive.

Inference

batch testing for quantitative results

python batch_test.py \
--resume [model directory path] \
--config [configs/*.yaml] \
--save_path [save directory path] \

single image testing for qualitative results

# for real examples, please download the segment anything checkpoint
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# run the single image test command
python test.py \
--resume [model directory path] \
--sam_path [sam checkpoint path] \
--img_path [image path] \
--gen_type ['image' or 'video'] \
--save_path [save directory path]

the general metrics are evaluated with:

cd evaluations
python evaluation.py --gt_path [ground truth images path] --g_path [generated NVS images path]

Training

The Ray Conditioning Normalization (RCN) to enhance the pose accuracy is trained with the following command:

# download image-conditional stable diffusion checkpoint released by lambda labs
# this training takes around 9 days on 4x a6000 (48G)
wget https://cv.cs.columbia.edu/zero123/assets/sd-image-conditioned-v2.ckpt
# or download checkpoint released by zero-1-t-3
# this training takes around 2 days on 4x 60000 (48G)
wget https://cv.cs.columbia.edu/zero123/assets/105000.ckpt
# change the finetune_from in train.sh, and run the command
sh train.sh

The pseudo-3D attention to smooth the consistency is trained with the same command (1 day on 4x A6000), but with different parameters:

# modify the configs/objaverse.yaml as follows
views: 4
use_3d_transformer: True
# modify the finetune_from in train.sh to you first stage model
finetune_from [RCN trained model]

Pretrained models

RCN w/o pseudo 3D attention model is available at huggingface.

Related work

Stable Video Diffusion fine-tunes image-to-video diffusion model for multi-view generations.
Efficient-3DiM fine-tunes the stable diffusion model with a stronger vision transformer DINO v2.
Consistent-1-to-3 applies the epipolar-attention to extract coarse results for the diffusion model.
One-2-3-45 and One-2-3-45++ train additional 3D network using the outputs of 2D generator.
MVDream, Consistent123 and Wonder3D also train multi-view diffusion models, yet still require post-processing for video rendering.
SyncDreamer and ConsistNet employ 3D representation into the latent diffusion model.

Citation

If you find our code helpful, please cite our paper:

@article{zheng2023free3D,
      author    = {Zheng, Chuanxia and Vedaldi, Andrea},
      title     = {Free3D: Consistent Novel View Synthesis without 3D Representation},
      journal   = {arXiv},
      year      = {2023},

Acknowledgements

Many thanks to Stanislaw Szymanowicz, Edgar Sucar, and Luke Melas-Kyriazi of VGG for insightful discussions and Ruining Li, Eldar Insafutdinov, and Yash Bhalgat of VGG for their helpful feedback. We would also like to thank the authors of Zero-1-to-3 and Objaverse-XL for their helpful discussions.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Free3D

Teaser example

Usage

Installation

Datasets

Inference

Training

Pretrained models

Related work

Citation

Acknowledgements

License

About

Releases

Packages

Languages

License

lyndonzheng/Free3D

Folders and files

Latest commit

History

Repository files navigation

Free3D

Teaser example

Usage

Installation

Datasets

Inference

Training

Pretrained models

Related work

Citation

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages