You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, at your custom/layers.py, there are EncoderLayer and DecoderLayer.
you actually use EncoderLayer, which is solely self-attention.
in DecoderLayer, the forward function receives encode_out, which means DecoderLayer follows the vanilla transformer structure, I think.
In the Music Transformer paper, I remember that this uses only self-masked attention at decoder part.
Did you try using the vanilla transformer structure with relative attention for implementation?
How was the result compared to using only self-masked attention, if you tried?
The text was updated successfully, but these errors were encountered:
Hi, at your custom/layers.py, there are EncoderLayer and DecoderLayer.
you actually use EncoderLayer, which is solely self-attention.
in DecoderLayer, the forward function receives encode_out, which means DecoderLayer follows the vanilla transformer structure, I think.
In the Music Transformer paper, I remember that this uses only self-masked attention at decoder part.
Did you try using the vanilla transformer structure with relative attention for implementation?
How was the result compared to using only self-masked attention, if you tried?
The text was updated successfully, but these errors were encountered: