You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MAE models are trained with global average pooling at the end instead of a class token. Since we're changing the size of each token by merging them together, we need to perform this global average pool with a weight proportional to the size of each token.
This is also what merge_wavg does when merging tokens together (just this time it's global).
Hi,
could you please explain why do we need this code snippet when training MAE? Why is apply_patch from timm.py not enough?
Thank you!
The text was updated successfully, but these errors were encountered: