-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No benefit for Deit-S #4
Comments
Hi, Thanks for trying it out! Based on our observation, the re-attention's benefits are proportional to the number of "similar blocks" as defined in the paper. The number of similar blocks are typically small when the depth of the model is small as shown in Figure in Fig. 1 in the paper. However, you can try with the cosine similarity as regularization as shown in the updated paper. Besides, as the model is shallow, it is not necessary to apply re-attention for all blocks. You could refer to Fig. 9 in the appendix. |
When I applied re-attention in Deit-S (https://github.com/facebookresearch/deit), no accuracy gain was observed. Could you give some advice?
The text was updated successfully, but these errors were encountered: