Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights trained using pre-training script produce NaN values when running fine-tuning script #64

Open
RienkF opened this issue Jan 15, 2024 · 3 comments

Comments

@RienkF
Copy link

RienkF commented Jan 15, 2024

Hi all,

I ran the fcmae pretrainer on a custom dataset, but after using the checkpoints generated by the pretraining script in the fine tuning script, the network produces NaN as output for every image.

Using no weights or the provided imagenet self supervised pretrained weights does not produce this issue.

I inspected the weights I got from the pretraining and the values don't seem off. There are no NaNs or zeroes in there, nor are there any large outliers.

Has anybody had this issue as well, or know of something that might cause/solve this? Thanks in advance

@garg-aayush
Copy link

Hi @RienkF
I encountered the same NaN values when running fine-tuning script on the custom pre-trained dataset. Any luck resolving this issue?

@RienkF
Copy link
Author

RienkF commented Aug 26, 2024

Hi @garg-aayush , in my case the problem turned out to be a double initialisation of the encoder weights. They were initialized both in the ConvNeXtV2 class, as wel as in the MAE class. I'd check if that is what happened for you too, otherwise their could be a range of other causes

@garg-aayush
Copy link

@RienkF, in my case it was an autocast issue. Currently, I am running it in fp32 and it works fine. I might need to update some packages later to run it in fp16/bf16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants