🛹 RollingDepth: Video Depth without Video Models

This repository represents the official implementation of the paper titled "Video Depth without Video Models".

Bingxin Ke¹, Dominik Narnhofer¹, Shengyu Huang¹, Lei Ke², Torben Peters¹, Katerina Fragkiadaki², Anton Obukhov¹, Konrad Schindler¹

¹ETH Zurich, ²Carnegie Mellon University

📢 News

2024-12-02: Paper is on arXiv.
2024-11-28: Inference code is released.

🛠️ Setup

The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090

📦 Repository

git clone https://github.com/prs-eth/RollingDepth.git
cd RollingDepth

🐍 Python environment

Create python environment:

# with venv
python -m venv venv/rollingdepth
source venv/rollingdepth/bin/activate

# or with conda
conda create --name rollingdepth python=3.12
conda activate rollingdepth

💻 Dependencies

Install dependicies:

pip install -r requirements.txt
bash script/install_diffusers_dev.sh  # Install modified diffusers with cross-frame self-attention

We use pyav for video I/O, which relies on ffmpeg (tested with version 5.1.6-0+deb12u1).

To see the modification in diffusers, search for comments "Modified in RollingDepth".

🏃 Test on your videos

All scripts are designed to run from the project root directory.

📷 Prepare input videos

Use sample videos:
```
bash script/download_sample_data.sh
```
These example videos are to be used only as debug/demo input together with the code and should not be distributed outside of the repo.
Or place your videos in a directory, for example, under data/samples.

🚀 Run with presets

python run_video.py \
    -i data/samples \
    -o output/samples_fast \
    -p fast \
    --verbose

-p or --preset: preset options
- fast for fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.
- fast1024 for fast inference at resolution 1024
- full for better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.
- paper for reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
-i or --input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.
-o or --output-dir: output directory.

Passing these inference arguments will overwrite the preset settings:

--res or --processing-resolution: the maximum resolution (in pixels) at which image processing will be performed. If set to 0, processes at the original input image resolution.
--refine-step: number of refinement iterations to improve accuracy and details. Set to 0 to disable refinement.
--snip-len or --snippet-lengths: number of frames to analyze in each snippet.
-d or --dilations: spacing between frames for temporal analysis, could have multiple values e.g. -d 1 10 25.

Clip sub-sequence to be processed:

--from or --start-frame: the starting frame index for processing, default to 0.
--frames or --frame-count: number of frames to process after the starting frame. Set to 0 (default) to process until the end of the video.

Output settings

--fps or --output-fps: frame rate (FPS) for the output video. Set to 0 (default) to match the input video's frame rate.
--restore-res or --restore-resolution: whether to restore the output to the original input resolution after processing, Default: False.
--save-sbs or --save-side-by-side: whether to save side-by-side videos of RGB and colored depth. Default: True.
--save-npy: whether to save depth maps as .npy files. Default: True.
--save-snippets: whether to save initial snippets. Default: False

Other argumenets

Please run python run_video.py --help to get details for other arguments.
For low GPU memory footage: pass --max-vae-bs 1 --unload-snippet true and use a smaller resolution, e.g. --res 512

⬇ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:

export HF_HOME=$(pwd)/cache

Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0

bash script/download_weight.sh

🦿 Evaluation on test datasets

Coming soon

🎓 Citation

@misc{ke2024rollingdepth,
    title={Video Depth without Video Models}, 
    author={Bingxin Ke and Dominik Narnhofer and Shengyu Huang and Lei Ke and Torben Peters and Katerina Fragkiadaki and Anton Obukhov and Konrad Schindler},
    year={2024},
    eprint={2411.19189},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.19189}, 
}

🙏 Acknowledgments

We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.

We are grateful to redmond.ai ([email protected]) for providing GPU resources.

🎫 License

This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)

By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data_split		data_split
diffusers		diffusers
doc/badges		doc/badges
rollingdepth		rollingdepth
script		script
src/util		src/util
.gitignore		.gitignore
LICENSE-MODEL.txt		LICENSE-MODEL.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
run_video.py		run_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

🛹 RollingDepth: Video Depth without Video Models

📢 News

🛠️ Setup

📦 Repository

🐍 Python environment

💻 Dependencies

🏃 Test on your videos

📷 Prepare input videos

🚀 Run with presets

Passing these inference arguments will overwrite the preset settings:

Clip sub-sequence to be processed:

Output settings

Other argumenets

⬇ Checkpoint cache

🦿 Evaluation on test datasets

🎓 Citation

🙏 Acknowledgments

🎫 License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

prs-eth/RollingDepth

Folders and files

Latest commit

History

Repository files navigation

🛹 RollingDepth: Video Depth without Video Models

📢 News

🛠️ Setup

📦 Repository

🐍 Python environment

💻 Dependencies

🏃 Test on your videos

📷 Prepare input videos

🚀 Run with presets

Passing these inference arguments will overwrite the preset settings:

Clip sub-sequence to be processed:

Output settings

Other argumenets

⬇ Checkpoint cache

🦿 Evaluation on test datasets

🎓 Citation

🙏 Acknowledgments

🎫 License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages