Weights trained using pre-training script produce NaN values when running fine-tuning script #64

RienkF · 2024-01-15T14:49:56Z

Hi all,

I ran the fcmae pretrainer on a custom dataset, but after using the checkpoints generated by the pretraining script in the fine tuning script, the network produces NaN as output for every image.

Using no weights or the provided imagenet self supervised pretrained weights does not produce this issue.

I inspected the weights I got from the pretraining and the values don't seem off. There are no NaNs or zeroes in there, nor are there any large outliers.

Has anybody had this issue as well, or know of something that might cause/solve this? Thanks in advance

garg-aayush · 2024-08-26T08:57:32Z

Hi @RienkF
I encountered the same NaN values when running fine-tuning script on the custom pre-trained dataset. Any luck resolving this issue?

RienkF · 2024-08-26T09:32:09Z

Hi @garg-aayush , in my case the problem turned out to be a double initialisation of the encoder weights. They were initialized both in the ConvNeXtV2 class, as wel as in the MAE class. I'd check if that is what happened for you too, otherwise their could be a range of other causes

garg-aayush · 2024-08-27T03:22:26Z

@RienkF, in my case it was an autocast issue. Currently, I am running it in fp32 and it works fine. I might need to update some packages later to run it in fp16/bf16.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights trained using pre-training script produce NaN values when running fine-tuning script #64

Weights trained using pre-training script produce NaN values when running fine-tuning script #64

RienkF commented Jan 15, 2024

garg-aayush commented Aug 26, 2024

RienkF commented Aug 26, 2024

garg-aayush commented Aug 27, 2024

Weights trained using pre-training script produce NaN values when running fine-tuning script #64

Weights trained using pre-training script produce NaN values when running fine-tuning script #64

Comments

RienkF commented Jan 15, 2024

garg-aayush commented Aug 26, 2024

RienkF commented Aug 26, 2024

garg-aayush commented Aug 27, 2024