Skip to content
/ tlgan Public

Time-Lapse Disentanglement With Conditional GANs [SIGGRAPH 2022]

License

Notifications You must be signed in to change notification settings

harskish/tlgan

Repository files navigation

Disentangling Random and Cyclic Effects in Time-Lapse Sequences

valley

Disentangling Random and Cyclic Effects in Time-Lapse Sequences
Erik Härkönen1, Miika Aittala2, Tuomas Kynkäänniemi1, Samuli Laine2, Timo Aila2, Jaakko Lehtinen1,2
1Aalto University, 2NVIDIA
https://doi.org/10.1145/3528223.3530170

Abstract: Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal. This enables us to ``re-render'' the sequences in ways that would not be possible with the input images alone. For example, we can stabilize a long sequence to focus on plant growth over many months, under selectable, consistent weather.
Our approach is based on Generative Adversarial Networks (GAN) that are conditioned with the time coordinate of the time-lapse sequence. Our architecture and training procedure are designed so that the networks learn to model random variations, such as weather, using the GAN's latent space, and to disentangle overall trends and cyclic variations by feeding the conditioning time label to the model using Fourier features with specific frequencies.
We show that our models are robust to defects in the training data, enabling us to amend some of the practical difficulties in capturing long time-lapse sequences, such as temporary occlusions, uneven frame spacing, and missing frames.

Video: https://youtu.be/UrQ3tOfpjuA

Setup

See the setup instructions.

Dataset preparation

See the dataset preprocessing instructions.

Usage

Training a model
First, go through the dataset preparation instructions above to produce a dataset zip.

# Print available options
python train.py --help

# Train TLGAN on Valley using 4 GPUs.
python train.py --outdir=~/training-runs --data=~/datasets/valley_1024x1024_2225hz.zip --gpus=4 --batch=32

# Train an unconditional StyleGAN2 on Teton using 2 GPUs.
python train.py --outdir=~/training-runs --data=~/datasets/teton_512x512_2225hz.zip --gpus=2 --batch=32 --cond=none --metrics=fid50k_full

Model visualizer
The interactive model visualizer can be used to explore the effects of the conditioning inputs and the latent space.

# Visualize a trained model
python visualize.py path/to/model.pkl

The UI can be scaled with the button in the top-right corner. The UI can be made fullscreen by pressing F11.

Grid visualizer
The input grid visualizer can be used to create 2D image grids, time-lapse images (stacked strips), and videos. All exported files (jpg, png, mp4) contain embedded metadata with all UI element states. This enables previously exported data to be loaded back into the UI via drag-and-drop.

# Open trained model pickle in grid visualizer
python grid_viz.py /path/to/model.pkl

# Reopen UI and load state from previously exported image
python grid_viz.py /path/to/image.png

Dataset visualization
Both visualizers can display dataset frames that most closely match the current conditioning variables. Set environment variable TLGAN_DATASET_ROOT or pass argument --dataset_root to specify the directory in which datasets are stored.

Downloads

Known issues

  • NVJPEG does not work correctly with CUDA 11.0 - 11.5 1. CPU decoding will be used instead, leading to reduced performance. Affects preproc/process_sequence.py, grid_viz.py, and visualize.py.

Citation

@article{harkonen2022tlgan,
  author    = {Erik Härkönen and Miika Aittala and Tuomas Kynkäänniemi and Samuli Laine and Timo Aila and Jaakko Lehtinen},
  title     = {Disentangling Random and Cyclic Effects in Time-Lapse Sequences},
  journal   = {{ACM} Trans. Graph.},
  volume    = {41},
  number    = {4},
  year      = {2022},
}

License

The code of this repository is based on StyleGAN3, which is released under the NVIDIA License.
All modified source files are marked separately and released under the CC BY-NC-SA 4.0 license.
The files in ./ext are provided under the MIT license.
The file mutable_zipfile.py is released under the Python License.
The included Roboto Mono font is licensed under the Apache 2.0 license.