-
Notifications
You must be signed in to change notification settings - Fork 641
Attention Layers
afiaka87 edited this page Apr 12, 2021
·
5 revisions
By default DALLE
will use full attention for all layers, but you can specify the attention type per layer as follows.
-
full
full attention -
axial_row
axial attention, along the rows of the image feature map -
axial_col
axial attention, along the columns of the image feature map -
conv_like
convolution-like attention, for the image feature map
dalle = DALLE(
# ...
attn_types = ('full', 'axial_row', 'axial_col', 'conv_like') # cycles between these four types of attention
)
Each different type is an attempt at replicating the scant details regarding the matter from OpenAI.
When in doubt - and if you don't need the VRAM/runtime savings, train with:
attn_types = ('full')
If you can meet these requirements - this is worth the install.
dalle = DALLE(
# ...
attn_types = ('full', 'sparse') # cycles between full and sparse attention)