-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistency with the original paper #24
Comments
The gamma issue is a minor thing but I can have a look at it. The channels share the same mask in the paper. |
“We experimented with a shared DropBlock mask across different feature channels or each feature |
Sure, that is easily fixable Expect it soon Edit: you can also do a PR if you want |
Hi, |
I haven't had much free time to deal with this, but I will review and accept merge requests |
I also found some difference between paper and code. |
To solve this issue, you could have a look at this folk(only for DropBlock2D) |
I would encourage you to do a pull request |
If you do look at the code linked above, note that mask_center is not initialized on the device, so the part where nn.ZeroPad2d is called will by default run on the CPU. For me, since I was training on a GPU, this slowed down a single forward call (of my model which uses many Dropblocks) from .15 seconds to 3 seconds. |
Hello, thanks for your nice code!
I found there were 2 inconsistencies with the original paper, and they are very easy to fix indeed:
gamma
: in the original paper, all theblock_mask
are complete squares (or cubes), sinces itsmask
are only sampled on the central parts.mask
s, while in your implement they use the same.I just figure them out, actually I do not know whether they are effective tricks, there are insufficient details discussed in the paper :)
The text was updated successfully, but these errors were encountered: