MindEditing is an open-source toolkit based on MindSpore, containing the most advanced image and video task models from open-source or Huawei Technologies Co. , such as IPT, FSRCNN, BasicVSR and other models. These models are mainly used for low-level vision task, such as Super-Resolution, DeNoise, DeRain, Inpainting. MindEditing also supports many platforms, including CPU/GPU/Ascend.Of course, you'll get an even better experience on the Ascend.
Some Demos:
- Video super-resolution demo
Video_SR_Demo-1.-.Trim.mp4
- Video frame Interpolation demo
Video_frame_Interpolation_Demo.mp4
Main features
-
Easy to use
We take the unified entry, you just specify the supported model name and configure the parameters in the parameter yaml file to start your task.
-
Support multiple tasks
MindEditing supports a variety of popular and contemporary tasks such as deblurring, denoising, super-resolution, and inpainting.
-
SOTA
MindEditing provides state-of-the-art algorithms in deblurring, denoising, super-resolution, and inpainting tasks.
With so many tasks, is there a model that can handle multiple tasks? Of course, the pre-trained model, namely, image processing transformer (IPT).The IPT model is a new pre-trained model,it is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after finetuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.
Excellent performance
- Compared with the state-of-the-art image processing models on different tasks, the IPT model performs better
-
Brush multiple low-level visual tasks
Compared with the state-of-the-art methods, the IPT model achieve the best performance.
-
Generalization Ability
Generation ability(table 4) of the IPT model on color image denoising with different noise levels.
-
The performance of CNN and IPT models using different percentages of data
When the pre-training data is limited, the CNN model can obtain better performance. With the increase of data volume, the IPT model based on Transformer module gains significant performance improvement, and the curve(table 5) trend also shows the promising potential of the IPT model.
-
Amazing actual image inference results
- Image Super-resolution task
The figure below shows super-resolution results with bicubic downsampling (×4) from Urban100. The proposed IPT model recovers more details.
- Image Denoising task
It must be pointed out that IPT won CVPR2023 NTIRE Image Denoising track champion.
The figure below shows color image denoising results with noise level σ = 50.
- Image Deraining task
The figure below shows image deraining results on the Rain100L dataset.
- mindspore >=1.9
- numpy =1.19.5
- scikit-image =0.19.3
- pyyaml =5.1
- pillow =9.3.0
- lmdb =1.3.0
- h5py =3.7.0
- imageio =2.25.1
- munch =2.5.0
Python can be installed by Conda.
Install Miniconda:
cd /tmp
curl -O https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py37_4.10.3-Linux-$(arch).sh
bash Miniconda3-py37_4.10.3-Linux-$(arch).sh -b
cd -
. ~/miniconda3/etc/profile.d/conda.sh
conda init bash
Create a virtual environment, taking Python 3.7.5 as an example:
conda create -n mindspore_py37 python=3.7.5 -y
conda activate mindspore_py37
Check the Python version.
python --version
To install the dependency, please run:
pip install -r requirements.txt
MindSpore(>=1.9) can be easily installed by following the official instruction where you can select your hardware platform for the best fit. To run in distributed mode, openmpi is required to install.
we provide the boot file of training and validation, chose different model config to start.Please see the document for more basic usage of MindEditing.
python3 train.py --config_path ./configs/basicvsr/train.yaml
# or
python3 val.py --config_path ./configs/basicvsr/val.yaml
-
Graph Mode and Pynative Mode
Graph mode is optimized for efficiency and parallel computing with a compiled static graph. In contrast, pynative mode is optimized for flexibility and easy development. You may alter the parameter system.context_mode in model config file to switch to pure pynative mode for development purpose.
MindEditing currently has a branch of 0.x, but it will have a branch of 1.x in the future. You'll find more features in the 1.x branch, so stay tuned.
-
April 6, 2023
The model (MPFER) of Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations is coming soon, Stay tuned.
-
March 15, 2023
The inference codes and demos of Tunable Conv had already been joined as test case, you can find them in ./tests/. Besides, the training codes are coming soon. The Tunable Conv has 4 models for demo, NAFNet for modulated image denoising, SwinIR for modulated image denoising and perceptual super-resolution, EDSR for modulated joint image denoising and deblurring and StyleNet for modulated style transfer.
Increasing the number of parallel work can speed up the training speed. The following is the experiment of example model on CPU 16-core GPU 2xP100:
num_parallel_workers: 8
epoch 1/100 step 1/133, loss = 0.045729052, duration_time = 00:01:07, step_time_avg = 0.00 secs, eta = 00:00:00
epoch 1/100 step 2/133, loss = 0.027709303, duration_time = 00:01:20, step_time_avg = 6.66 secs, eta = 1 day(s) 00:36:02
epoch 1/100 step 3/133, loss = 0.027135072, duration_time = 00:01:33, step_time_avg = 8.74 secs, eta = 1 day(s) 08:17:56
num_parallel_workers: 16
epoch 1/100 step 1/133, loss = 0.04535071, duration_time = 00:00:47, step_time_avg = 0.00 secs, eta = 00:00:00
epoch 1/100 step 2/133, loss = 0.032363698, duration_time = 00:01:00, step_time_avg = 6.74 secs, eta = 1 day(s) 00:54:38
epoch 1/100 step 3/133, loss = 0.02718924, duration_time = 00:01:13, step_time_avg = 8.83 secs, eta = 1 day(s) 08:36:07
The following tutorials are provided to help users learn to use Mindediting.
model_name | task | Conference | Support platform | Download |
---|---|---|---|---|
IPT | Multi-Task | CVPR 2021 | Ascend/GPU | ckpt |
BasicVSR | Video Super Resolution | CVPR 2021 | Ascend/GPU | ckpt |
BasicVSR++Light | Video Super Resolution | CVPR 2022 | Ascend/GPU | ckpt |
NOAHTCV | Image DeNoise | CVPR 2021(MAI Challenge) | Ascend/GPU | ckpt |
RRDB | Image Super Resolution | ECCVW, 2018 | Ascend/GPU | ckpt |
FSRCNN | Image Super Resolution | ECCV 2016 | Ascend/GPU | ckpt |
SRDiff | Image Super Resolution | Neurocomputing 2022 | Ascend/GPU | ckpt |
VRT | Multi-Task | arXiv(2022.01) | Ascend/GPU | ckpt |
RVRT | Multi-Task | arXiv(2022.06) | Ascend/GPU | ckpt |
TTVSR | Video Super Resolution | CVPR 2022 | Ascend/GPU | ckpt |
MIMO-Unet | Image DeBlur | ICCV 2021 | Ascend/GPU | ckpt |
NAFNet | Image DeBlur | arXiv(2022.04) | Ascend/GPU | ckpt |
CTSDG | Image InPainting | ICCV 2021 | Ascend/GPU | ckpt |
EMVD | Video Denoise | CVPR 2021 | Ascend/GPU | ckpt |
Tunable_Conv | tunable task(image process) | arXiv(2023.04) | Ascend/GPU | ckpt |
IFR+ | Video Frame Interpolation | CVPR 2022 | Ascend/GPU | ckpt |
MPFER | 3D-based Multi-Frame Denoising(is coming soon) | arXiv(2023.04) | GPU | ckpt |
Download: The model files are available in.ckpt
and.OM
formats, and you can download the corresponding files to carry out your research work.
- The
.ckpt
file can be downloaded by clicking the corresponding link in thedownload
column of the table above. - The
.om
file can be found here. For details about how to use the.om
file, see deploy. - The multi-task model can be downloaded according to the task division of the corresponding model files, the selection of model files refer to the yaml file of the different models in the
configs
folder. - For models that require spynet or vgg pretrained weights, they can also be downloaded in the corresponding models link.
Please refer to ModelZoo Homepage or the documentation under the folder docs
for more details on the model.
This project follows the Apache License 2.0 open-source license.
The dynamic version is still under development, if you find any issue or have an idea on new features, please don't hesitate to contact us via issue.
MindSpore is an open source project that welcome any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as standardized toolkit to reimplement existing methods and develop their own new computer vision methods.
If you find MindEditing useful in your research, please consider to cite the following related papers:
@misc{MindEditing 2022,
title={{MindEditing}:MindEditing for low-level vision task},
author={MindEditing},
howpublished = {\url{https://github.com/mindspore-lab/mindediting}},
year={2022}
}
- MindCV:A toolbox of vision models and algorithms based on MindSpore.
- MindNLP:An opensource NLP library based on MindSpore.
- MindDiffusion:A collection of diffusion models based on MindSpore.
- MindFace:MindFace is an open source toolkit based on MindSpore, containing the most advanced face recognition and detection models, such as ArcFace, RetinaFace and other models.
- MindAudio:An open source all-in-one toolkit for the voice field based on MindSpore.
- MindOCR:A toolbox of OCR models, algorithms, and pipelines based on MindSpore.
- MindRL:A high-performance, scalable MindSpore reinforcement learning framework.
- MindREC:MindSpore large-scale recommender system library.
- MindPose:an open-source toolbox for pose estimation based on MindSpore.