Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we need this code snippet for training MAE? #44

Open
redagavin opened this issue Aug 25, 2024 · 1 comment
Open

Why do we need this code snippet for training MAE? #44

redagavin opened this issue Aug 25, 2024 · 1 comment

Comments

@redagavin
Copy link

image
Hi,
could you please explain why do we need this code snippet when training MAE? Why is apply_patch from timm.py not enough?
Thank you!

@dbolya
Copy link
Contributor

dbolya commented Aug 26, 2024

MAE models are trained with global average pooling at the end instead of a class token. Since we're changing the size of each token by merging them together, we need to perform this global average pool with a weight proportional to the size of each token.

This is also what merge_wavg does when merging tokens together (just this time it's global).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants