Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I loaded the pre-training weights during training and the resolution matches my training set, but an error is reported in train.py. If it works fine without pre-training weights, which file do I need to change? #39

Open
999789 opened this issue Apr 9, 2024 · 3 comments

Comments

@999789
Copy link

999789 commented Apr 9, 2024

Traceback (most recent call last):
File "train.py", line 369, in
main() # pylint: disable=no-value-for-parameter
File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "train.py", line 362, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 94, in launch_training
torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in wrap
fn(i, *args)
File "/root/autodl-tmp/stylegan3-fun-main/train.py", line 50, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/root/autodl-tmp/stylegan3-fun-main/training/training_loop.py", line 163, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "/root/autodl-tmp/stylegan3-fun-main/torch_utils/misc.py", line 162, in copy_params_and_buffers
tensor.copy
(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

@999789
Copy link
Author

999789 commented Apr 9, 2024

python train.py --outdir=training-runs --cfg=stylegan3-t --data=/root/autodl-tmp/stylegan3-fun-main/hechengtupianrgba.zip --gpus=4 --batch=16 --gamma=6 --mirror=1 --kimg=5000 --snap=25 --batch-gpu=4 --metrics=none --resume=/root/autodl-tmp/stylegan3-fun-main/network-snapshot-011000.pkl

@PDillis
Copy link
Owner

PDillis commented Apr 10, 2024

Basically, the mismatch says it's when trying to load the pre-trained .pkl on the newly constructed stylegan3-t configuration. I'll try to fix it, as it also failed with me with a pre-trained StyleGAN3-T model, so perhaps the construction of the new networks is wrong. I'll update this whenever I can fix it.

@999789
Copy link
Author

999789 commented Apr 11, 2024

Thanks for the reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants