CUGAN

Real-CUGAN is a super-resolution neural network for anime-style arts, based on the waifu2x-cunet network and trained by bilibili on millions of anime images with a RealESRGANv2-like approach.

Link:

(stable) https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/cugan_v2.7z

Models

The models support upscaling by 2x/3x/4x and also denoising.

scale: 2 or 3 or 4
noise: -1, 0, 1, 2, 3 (like waifu2x), 1/2 is only supported by scale=2.

`vsmlrt.py` wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt (release v7 or above).

from vsmlrt import CUGAN, Backend

src = core.std.BlankClip(format=vs.RGBS) # only supports RGBS input formats

# clamp src to be safe as out of range values will produce large negative output.
src = core.akarin.Expr(src, "x 0 1 clamp")

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = CUGAN(src, noise=-1, scale=2, backend=Backend.ORT_CUDA())

Notes

Make sure your RGBS input to CUGAN is within [0,1] range. Out of range values will trip the NN into producing large negative values.

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

CPU: private memory including VapourSynth
GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v7
Real-CUGAN 7e77b85
vs-mlrt v8 (driver 511.79)

Performance

FP32

Model	[1] ort-cuda	[2] pytorch	[3] ort-cuda
2x	3.30 / 10445	2.36 / 20076	3.24 / 10251
3x (540p patch)	1.52 / 9978	0.77 / 19304
4x	1.96 / 18377	1.25 / 22353	1.93 / 18183

FP16

Model	[1] ort-cuda	[2] pytorch	[3] ort-cuda
2x	4.27 / 10185	3.29 / 12258	4.40 / 9991
3x	1.61 / 19007	1.55 / 21816	1.62 / 23442
4x	2.30 / 10181	1.43 / 13616	2.40 / 9987

Tesla A100 (SXM4, 80 GB)

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

vs-mlrt v9

Performance

FP16

Model	[1] trt	[1] trt (2 streams)
2x	19.4 / 4647	26.9 / 8558

EPYC Milan

Hardware: EPYC Milan 32C64T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019.

Input size: 1920x1080

Backends

vs-mlrt v7

Performance

FP32

Model	[1] ov-cpu
2x	0.20 / 22627
3x	0.094 / 40358
4x	0.18 / 53174

Home

Runtimes
Models
- waifu2x
- DPIR
- RealESRGANv2
- Real-CUGAN
- RIFE
- External models
Device-specific benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUGAN

Models

`vsmlrt.py` wrapper Usage

Notes

Benchmarking

RTX 3090

Backends

Performance

FP32

FP16

Tesla A100 (SXM4, 80 GB)

Backends

Performance

FP16

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

CUGAN

Models

vsmlrt.py wrapper Usage

Notes

Benchmarking

RTX 3090

Backends

Performance

FP32

FP16

Tesla A100 (SXM4, 80 GB)

Backends

Performance

FP16

EPYC Milan

Backends

Performance

FP32

Clone this wiki locally

`vsmlrt.py` wrapper Usage