Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High resolution images #2

Open
vladradishevsky opened this issue May 24, 2021 · 9 comments
Open

High resolution images #2

vladradishevsky opened this issue May 24, 2021 · 9 comments

Comments

@vladradishevsky
Copy link

Hello!

Have you tested the model with high resolution images (512, 1024, ...)? Pictures are also drawn with high quality?

What parameters do you recommend to change to train 512x512 and 1024x1024 images?

@JunlinHan
Copy link
Owner

JunlinHan commented May 25, 2021

Hi!
DCLGAN is quite memory sensitive, it needs to support two generators/discriminators at a single GPU. If you have a GPU with 16G memory, you can support the training in 512^2 res, but may not support 1024^2.
In test time, you can train your model on 256 res, but test it with 512^2/512 x 1024/1024^2, it would be quite flexible during testing.
You may try this setting to train 1024 res translation, and you probably needs to train it for a longer time than usual:
--load_size 800 --crop_size 512
And this for 512 res:
--load_size 572 --crop_size 512

You may also check those papers below, they are designed for high-res translation:
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Pix2pixHD

@itsMorteza
Copy link

Hi,
why do bigger patches needed for 512*512 res images?
what is the best iteration per image ratio what I mean is how many epochs needed for large datasets such as 60k images?
thanks.

@JunlinHan
Copy link
Owner

JunlinHan commented May 26, 2021

Hi,
why do bigger patches needed for 512*512 res images?
what is the best iteration per image ratio what I mean is how many epochs needed for large datasets such as 60k images?
thanks.

Hi,
Usually, training with higher resolution results better performance in testing. I tried to use 1680*800 resolution images before, training with 512^2 patches gives a much better result than 256^2.
1 million iterations is enough for any kind of one-to-one image translation model. You may train it for 20 epochs. Use setting --n_epochs 10 --n_epochs_decay 10 should work well.

@zrt791521360
Copy link

Hi,thank you for your excellent work!

If the width and height of the image I want are different, how should I train? Do I need to change the network structure, or do I just need to modify the parameters to read the image?

@JunlinHan
Copy link
Owner

JunlinHan commented Jun 2, 2021

Hi,thank you for your excellent work!

If the width and height of the image I want are different, how should I train? Do I need to change the network structure, or do I just need to modify the parameters to read the image?

Hi!
You may try these two choices.
1: Just crop it into square patches for training. You can test your images with original resolution as the network is fully convolutiona ( but width and height must be divisible by 4).
2: Set -preprocess none during trainint/testing. But you need to test it a little bit to see if this setting fits your GPU memory.

@itsMorteza
Copy link

Hi, thanks for your reply on the front of crop size and load size. How does the network react in changing generator power?
when we face an application with a large domain gap do you think which one is more effective:

  • Using more blocks in the generator
  • Increasing the number of filters in the generator
    I tried weaker discriminator and it worked out well however the changes weren't dramatic. Do you have any suggestions on changing the training procudure?
    Thanks

@JunlinHan
Copy link
Owner

Hi, thanks for your reply on the front of crop size and load size. How does the network react in changing generator power?
when we face an application with a large domain gap do you think which one is more effective:

* Using more blocks in the generator

* Increasing the number of filters in the generator
  I tried weaker discriminator and it worked out well however the changes weren't dramatic. Do you have any suggestions on changing the training procudure?
  Thanks

Hi,
1: Using more blocks in the generator.
It may not be very helpful. Sometimes 6 blocks even give better results than 9.
2: Increasing the number of filters in the generator
Increasing channels may also not be very helpful too.
3: weaker discriminator.
Ye, this should be somehow helpful. But it's usual that the changes were not very obvious.

I would think about changing the architecture of the generator or go to paired setting. If the domain gap is very huge, a supervised( paired) setting might be more efficient, if paired data is available. Also, the architecture of this paper, as in some recent papers, is resnet-9, which was proposed in 2016. More recent papers are trying to use stylegan-based generator and transformer-based generator. These new generators might provide a better ability.
Cheers

@itsMorteza
Copy link

In respect of generator arch, you mean using more blocks (such as 12 - 18 blocks) or even more filters(NGF = 80 or 128) in Resnet doesn't shrink the chance of over-feating or saturation on the Color domain. (instead of changing texture and more visible transformation.)
I get your recommendation on according to adaptive discriminator or Stylegan in generator however their changes damage the semantic consistency where the Cyclegan based models don't make.

@JunlinHan
Copy link
Owner

In respect of generator arch, you mean using more blocks (such as 12 - 18 blocks) or even more filters(NGF = 80 or 128) in Resnet doesn't shrink the chance of over-feating or saturation on the Color domain. (instead of changing texture and more visible transformation.)
I get your recommendation on according to adaptive discriminator or Stylegan in generator however their changes damage the semantic consistency where the Cyclegan based models don't make.

Hi,
Yes, it may not be very effective by simply adding more blocks or filters. The improvement can be very limited. Also, adding more filters can slow up the training a lot. You may try adding more blocks, since it may not impact the training speed a lot. If you are doing something like semantic segmentation, which means you need low-level features to do deconvolution, you may also consider adding some skip-connections or employing a multi-head/branch, then do a fusion, to capture both low-level feature and high-level feature for upsampling.

For arch, ok, if stylegan-based models are not good. How about the arch of SPADE[1], or even implicit neural representation[2]?

[1] https://github.com/NVlabs/SPADE
[2] Anycost GANs for Interactive Image Synthesis and Editing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants