Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation acc in the pretrain phase in pytorch #34

Open
Sword-keeper opened this issue Jun 17, 2020 · 10 comments
Open

validation acc in the pretrain phase in pytorch #34

Sword-keeper opened this issue Jun 17, 2020 · 10 comments

Comments

@Sword-keeper
Copy link

I renumber you supported your best pretrained model in a issue. And its validation acc is 64%. I want to modify your backbone. However, the best val acc in my pretrain phase is 41%. And I rerun your pretrain code. I found the best val acc is 48%. So, did you have some tricks when you pretrained the model?

@yaoyao-liu
Copy link
Owner

That model is trained using exactly the same code in the GitHub repository.

Please provide me with more information so that I might give you further suggestions. E.g., how you process the dataset, and what is your PyTorch version.

@Sword-keeper
Copy link
Author

image
Firstly, when I use your supported 'max_acc.pth‘ to run your meta phase, it will out of memory in the val phase. When i added 'with torch.no grad()'. It can run smoothly. What's more, the test acc is equal to your result. It(out of memory) also happened in the pre-val phase when I run the pretrain phase. So I just changed your preval forward code like this.

@yaoyao-liu
Copy link
Owner

You should not add with torch.no grad() as we need to calculate the gradients with torch.autograd.grad.

May I know what GPU you’re using?

@Sword-keeper
Copy link
Author

my torch version is 1.3.1 and data preprocess is same to you

@Sword-keeper
Copy link
Author

my gpu is gtx 2080, 8g
I put the this part(calculate the gradients with torch.autograd.grad.) in the optimize_base() part. And it is before the with torch.no grad(). Did i do something wrong?

@yaoyao-liu
Copy link
Owner

In your screenshot, you use a function named self.base. I guess it is a function added by you. Could you please provide me with the details of that function?

Other parts of your code look correct. If you cannot use meta validation during the pre-training phase, you may use a normal validation for 64 classes instead. You may also try the pre-training code in DeepEMD and FEAT. We're using the same pre-training strategy.

@Sword-keeper
Copy link
Author

oh self.base is baselearner in your code. I will rerun this code once more. And try other ways. Thank you!

@yaoyao-liu
Copy link
Owner

It seems your change is correct. I am not sure what makes your pre-training accuracy lower than excepted. It should be around 60% for meta validation after pre-training. I'll check the related code to find if there is any bug.

I also suggest you run exactly the same code using our config (PyTorch 0.4.0) if it is possible. You may also try the other two methods I mentioned. They all provide the pre-training code.

@Sword-keeper
Copy link
Author

When I use rtx2080 run your code in torch 0.4.0, there were some bugs in the baselearner.
net = F.linear(input_x, fc1_w, fc1_b) the bugs shows cublas runtime error: the GPU program failed to execute.
I tried to fix it in many ways but failed. However, when I run your code in rtx1060, it succesed. So i updated the pytorch, and it can run again. Maybe there are something wrong between rtx2080, cuda version and torch version. If someone also have this problem, you can tell them to change the gpu or torch version or cuda version.

@yaoyao-liu
Copy link
Owner

Thanks for reporting this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants