Have a question about model #10

haeju-poza · 2020-11-25T03:15:33Z

Hi, at your custom/layers.py, there are EncoderLayer and DecoderLayer.
you actually use EncoderLayer, which is solely self-attention.

in DecoderLayer, the forward function receives encode_out, which means DecoderLayer follows the vanilla transformer structure, I think.

In the Music Transformer paper, I remember that this uses only self-masked attention at decoder part.

Did you try using the vanilla transformer structure with relative attention for implementation?
How was the result compared to using only self-masked attention, if you tried?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have a question about model #10

Have a question about model #10

haeju-poza commented Nov 25, 2020

Have a question about model #10

Have a question about model #10

Comments

haeju-poza commented Nov 25, 2020