You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on my understanding, in your code, you only choose the first decoder layer output as the feature to predict the action. However, i see the original detr code the transformer output is:
Hello, in your detr code, you use transformer get the output is [bs, hidden_dim, feature_dim], the code is
the transformer code is
Based on my understanding, in your code, you only choose the first decoder layer output as the feature to predict the action. However, i see the original detr code the transformer output is:
The original detr code use the same feature processing code
I would like to ask why only the first-layer output is chosen as the feature. Would selecting the seventh layer be a better choice? Thank you!!!
The text was updated successfully, but these errors were encountered: