Attention Layers

Training with attention

By default DALLE will use full attention for all layers, but you can specify the attention type per layer as follows.

dalle = DALLE(
    # ...
    attn_types = ('full', 'axial_row', 'axial_col', 'conv_like')  # cycles between these four types of attention
)

Each different type is an attempt at replicating the scant details regarding the matter from OpenAI.

When in doubt - and if you don't need the VRAM/runtime savings, train with:

attn_types = ('full')

If you can meet these requirements - this is worth the install.

dalle = DALLE(
    # ...
    attn_types = ('full', 'sparse')  # cycles between full and sparse attention)

lord krishna with arjun