release 0.1.0

gosha20777 · Jul 5, 2024 · cd00af4 · cd00af4
1 parent 8550b88
commit cd00af4
Show file tree

Hide file tree

Showing 81 changed files with 8,734 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -159,4 +159,11 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
+
+data/
+outdir/
+predict.py
+train.py
+pretrain.py
+test.ipynb
diff --git a/README.md b/README.md
@@ -1,2 +1,89 @@
-# rawformer
-Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
+# Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
+
+🚀 The paper was accepted to the [ECCV 2024](https://eccv.ecva.net/Conferences/2024) conference. The preprint is available on [arXiv](https://arxiv.org/abs/2404.10700). 🚀
+
+### Authors
+Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte
+
+![Rawformer](figures/main.png)
+
+### Abstract
+*Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images.*
+
+## To Do
+
+*It is the first version of the code (0.1.0). We are working on the following tasks:*
+
+- [x] Release Rawformer code
+- [ ] Upload prepared datasets
+- [ ] Upload the pre-trained models
+- [ ] Rewrite the code to pytorch-lightning
+- [ ] Add ONNX export scripts 
+
+
+## Datasets
+
+### Data Structure
+```
+- data
+    - <dataset_name>
+        - trainA
+            - img1.jpg, img2.jpg, ...
+        - trainB
+            - imgA.jpg, imgB.jpg, ...
+        - trainA
+            - img1.jpg, img2.jpg, ...
+        - trainB
+            - imgA.jpg, imgB.jpg, ...
+```
+
+### Dowload Datasets
+
+*Under construction*
+
+
+## Pretained Models
+
+*Under construction*
+
+
+## How To Use
+
+### Setup The Environment
+```bash
+git clone https://github.com/gosha20777/rawformer.git
+cd rawformer
+conda env create -n rawformer -f environment.yaml
+conda activate rawformer
+python setup.py install
+```
+
+### Train Rawformer
+
+#### Pretrain Generator
+```bash
+cd experiments/<experiment_name>
+python pretrain.py --batch-size 8
+```
+
+#### Train Rawformer
+```bash
+cd experiments/<experiment_name>
+python train.py
+```
+
+#### Test Rawformer
+```bash
+cd experiments/<experiment_name>
+python predict.py <model_path> --split test
+```
+
+## Citation
+```BibTeX
+@article{perevozchikov2024rawformer,
+  title={Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs},
+  author={Perevozchikov, Georgy and Mehta, Nancy and Afifi, Mahmoud and Timofte, Radu},
+  journal={arXiv preprint arXiv:2404.10700},
+  year={2024}
+}
+```
diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,151 @@
+name: rawformer
+channels:
+  - pytorch
+  - nvidia
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - blas=1.0=mkl
+  - bottleneck=1.3.7=py312ha883a20_0
+  - brotli=1.0.9=h5eee18b_8
+  - brotli-bin=1.0.9=h5eee18b_8
+  - brotli-python=1.0.9=py312h6a678d5_8
+  - bzip2=1.0.8=h5eee18b_6
+  - ca-certificates=2024.3.11=h06a4308_0
+  - certifi=2024.6.2=py312h06a4308_0
+  - charset-normalizer=2.0.4=pyhd3eb1b0_0
+  - contourpy=1.2.0=py312hdb19cb5_0
+  - cuda-cudart=12.1.105=0
+  - cuda-cupti=12.1.105=0
+  - cuda-libraries=12.1.0=0
+  - cuda-nvrtc=12.1.105=0
+  - cuda-nvtx=12.1.105=0
+  - cuda-opencl=12.5.39=0
+  - cuda-runtime=12.1.0=0
+  - cuda-version=12.5=3
+  - cycler=0.11.0=pyhd3eb1b0_0
+  - cyrus-sasl=2.1.28=h52b45da_1
+  - dbus=1.13.18=hb2f20db_0
+  - expat=2.6.2=h6a678d5_0
+  - ffmpeg=4.3=hf484d3e_0
+  - filelock=3.13.1=py312h06a4308_0
+  - fontconfig=2.14.1=h4c34cd2_2
+  - fonttools=4.51.0=py312h5eee18b_0
+  - freetype=2.12.1=h4a9f257_0
+  - glib=2.78.4=h6a678d5_0
+  - glib-tools=2.78.4=h6a678d5_0
+  - gmp=6.2.1=h295c915_3
+  - gnutls=3.6.15=he1e5248_0
+  - gst-plugins-base=1.14.1=h6a678d5_1
+  - gstreamer=1.14.1=h5eee18b_1
+  - icu=73.1=h6a678d5_0
+  - idna=3.7=py312h06a4308_0
+  - intel-openmp=2023.1.0=hdb19cb5_46306
+  - jinja2=3.1.4=py312h06a4308_0
+  - jpeg=9e=h5eee18b_1
+  - kiwisolver=1.4.4=py312h6a678d5_0
+  - krb5=1.20.1=h143b758_1
+  - lame=3.100=h7b6447c_0
+  - lcms2=2.12=h3be6417_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - lerc=3.0=h295c915_0
+  - libbrotlicommon=1.0.9=h5eee18b_8
+  - libbrotlidec=1.0.9=h5eee18b_8
+  - libbrotlienc=1.0.9=h5eee18b_8
+  - libclang=14.0.6=default_hc6dbbc7_1
+  - libclang13=14.0.6=default_he11475f_1
+  - libcublas=12.1.0.26=0
+  - libcufft=11.0.2.4=0
+  - libcufile=1.10.0.4=0
+  - libcups=2.4.2=h2d74bed_1
+  - libcurand=10.3.6.39=0
+  - libcusolver=11.4.4.55=0
+  - libcusparse=12.0.2.55=0
+  - libdeflate=1.17=h5eee18b_1
+  - libedit=3.1.20230828=h5eee18b_0
+  - libffi=3.4.4=h6a678d5_1
+  - libgcc-ng=11.2.0=h1234567_1
+  - libglib=2.78.4=hdc74915_0
+  - libgomp=11.2.0=h1234567_1
+  - libiconv=1.16=h5eee18b_3
+  - libidn2=2.3.4=h5eee18b_0
+  - libjpeg-turbo=2.0.0=h9bf148f_0
+  - libllvm14=14.0.6=hdb19cb5_3
+  - libnpp=12.0.2.50=0
+  - libnvjitlink=12.1.105=0
+  - libnvjpeg=12.1.1.14=0
+  - libpng=1.6.39=h5eee18b_0
+  - libpq=12.17=hdbd6064_0
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libtasn1=4.19.0=h5eee18b_0
+  - libtiff=4.5.1=h6a678d5_0
+  - libunistring=0.9.10=h27cfd23_0
+  - libuuid=1.41.5=h5eee18b_0
+  - libwebp-base=1.3.2=h5eee18b_0
+  - libxcb=1.15=h7f8727e_0
+  - libxkbcommon=1.0.1=h5eee18b_1
+  - libxml2=2.10.4=hfdd30dd_2
+  - llvm-openmp=14.0.6=h9e868ea_0
+  - lz4-c=1.9.4=h6a678d5_1
+  - markupsafe=2.1.3=py312h5eee18b_0
+  - matplotlib=3.8.4=py312h06a4308_0
+  - matplotlib-base=3.8.4=py312h526ad5a_0
+  - mkl=2023.1.0=h213fc3f_46344
+  - mkl-service=2.4.0=py312h5eee18b_1
+  - mkl_fft=1.3.8=py312h5eee18b_0
+  - mkl_random=1.2.4=py312hdb19cb5_0
+  - mpmath=1.3.0=py312h06a4308_0
+  - mysql=5.7.24=h721c034_2
+  - ncurses=6.4=h6a678d5_0
+  - nettle=3.7.3=hbbd107a_1
+  - networkx=3.2.1=py312h06a4308_0
+  - numexpr=2.8.7=py312hf827012_0
+  - numpy=1.26.4=py312hc5e2394_0
+  - numpy-base=1.26.4=py312h0da6c21_0
+  - openh264=2.1.1=h4ff587b_0
+  - openjpeg=2.4.0=h3ad879b_0
+  - openssl=3.0.14=h5eee18b_0
+  - packaging=23.2=py312h06a4308_0
+  - pandas=2.2.2=py312h526ad5a_0
+  - pcre2=10.42=hebb0a14_1
+  - pillow=10.3.0=py312h5eee18b_0
+  - pip=24.0=py312h06a4308_0
+  - ply=3.11=py312h06a4308_1
+  - pyparsing=3.0.9=py312h06a4308_0
+  - pyqt=5.15.10=py312h6a678d5_0
+  - pyqt5-sip=12.13.0=py312h5eee18b_0
+  - pysocks=1.7.1=py312h06a4308_0
+  - python=3.12.4=h5148396_1
+  - python-dateutil=2.9.0post0=py312h06a4308_2
+  - python-tzdata=2023.3=pyhd3eb1b0_0
+  - pytorch=2.3.1=py3.12_cuda12.1_cudnn8.9.2_0
+  - pytorch-cuda=12.1=ha16c6d3_5
+  - pytorch-mutex=1.0=cuda
+  - pytz=2024.1=py312h06a4308_0
+  - pyyaml=6.0.1=py312h5eee18b_0
+  - qt-main=5.15.2=h53bd1ea_10
+  - readline=8.2=h5eee18b_0
+  - requests=2.32.2=py312h06a4308_0
+  - setuptools=69.5.1=py312h06a4308_0
+  - sip=6.7.12=py312h6a678d5_0
+  - six=1.16.0=pyhd3eb1b0_1
+  - sqlite=3.45.3=h5eee18b_0
+  - sympy=1.12=py312h06a4308_0
+  - tbb=2021.8.0=hdb19cb5_0
+  - tk=8.6.14=h39e8969_0
+  - torchaudio=2.3.1=py312_cu121
+  - torchvision=0.18.1=py312_cu121
+  - tornado=6.4.1=py312h5eee18b_0
+  - typing_extensions=4.11.0=py312h06a4308_0
+  - tzdata=2024a=h04d1e81_0
+  - unicodedata2=15.1.0=py312h5eee18b_0
+  - urllib3=2.2.2=py312h06a4308_0
+  - wheel=0.43.0=py312h06a4308_0
+  - xz=5.4.6=h5eee18b_1
+  - yaml=0.2.5=h7b6447c_0
+  - zlib=1.2.13=h5eee18b_1
+  - zstd=1.5.5=hc292b87_2
+  - pip:
+      - tqdm==4.66.4
+      - einops==0.8.0
diff --git a/figures/main.png b/figures/main.png
diff --git a/rawformer/__init__.py b/rawformer/__init__.py
@@ -0,0 +1,3 @@
+from .consts      import CONFIG_NAME, ROOT_DATA, ROOT_OUTDIR
+from .utils.funcs import join_dicts
+from .train.train import train
diff --git a/rawformer/base/LICENSE b/rawformer/base/LICENSE
@@ -0,0 +1,59 @@
+Copyright (c) 2021-2023, The LS4GAN Project Developers
+Copyright (c) 2017, Jun-Yan Zhu and Taesung Park
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+--------------------------- LICENSE FOR pix2pix --------------------------------
+BSD License
+
+For pix2pix software
+Copyright (c) 2016, Phillip Isola and Jun-Yan Zhu
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+----------------------------- LICENSE FOR DCGAN --------------------------------
+BSD License
+
+For dcgan.torch software
+
+Copyright (c) 2015, Facebook, Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+Neither the name Facebook nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/rawformer/base/__init__.py b/rawformer/base/__init__.py
diff --git a/rawformer/base/image_pool.py b/rawformer/base/image_pool.py
@@ -0,0 +1,61 @@
+# LICENSE
+# This file was extracted from
+#   https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
+# Please see `rawformer/base/LICENSE` for copyright attribution and LICENSE
+
+# pylint: disable=line-too-long
+
+import random
+import torch
+
+
+class ImagePool():
+    """This class implements an image buffer that stores previously generated images.
+
+    This buffer enables us to update discriminators using a history of generated images
+    rather than the ones produced by the latest generators.
+    """
+
+    def __init__(self, pool_size):
+        """Initialize the ImagePool class
+
+        Parameters:
+            pool_size (int) -- the size of image buffer, if pool_size=0, no buffer will be created
+        """
+        self.pool_size = pool_size
+        if self.pool_size > 0:  # create an empty pool
+            self.num_imgs = 0
+            self.images = []
+
+    def query(self, images):
+        """Return an image from the pool.
+
+        Parameters:
+            images: the latest generated images from the generator
+
+        Returns images from the buffer.
+
+        By 50/100, the buffer will return input images.
+        By 50/100, the buffer will return images previously stored in the buffer,
+        and insert the current images to the buffer.
+        """
+        if self.pool_size == 0:  # if the buffer size is 0, do nothing
+            return images
+        return_images = []
+        for image in images:
+            image = torch.unsqueeze(image.data, 0)
+            if self.num_imgs < self.pool_size:   # if the buffer is not full; keep inserting current images to the buffer
+                self.num_imgs = self.num_imgs + 1
+                self.images.append(image)
+                return_images.append(image)
+            else:
+                p = random.uniform(0, 1)
+                if p > 0.5:  # by 50% chance, the buffer will return a previously stored image, and insert the current image into the buffer
+                    random_id = random.randint(0, self.pool_size - 1)  # randint is inclusive
+                    tmp = self.images[random_id].clone()
+                    self.images[random_id] = image
+                    return_images.append(tmp)
+                else:       # by another 50% chance, the buffer will return the current image
+                    return_images.append(image)
+        return_images = torch.cat(return_images, 0)   # collect all the images and return
+        return return_images