Removal of multihead attention from activation #13

dreamer2368 · 2024-10-23T21:08:16Z

Per @punkduckable , multihead attention should be implemented as a layer, not an activation function. However, the current implementation simply uses multihead attention as an activation function, which also disrupts overall structure within MultiLayerPerceptron.

multihead attention should be removed from activation, and probably implemented as a derived class of latent space.

The text was updated successfully, but these errors were encountered:

dreamer2368 added the enhancement New feature or request label Oct 23, 2024

dreamer2368 assigned punkduckable Oct 23, 2024

dreamer2368 mentioned this issue Oct 23, 2024

Modularizing encoder/decoder in Autoencoder class #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removal of multihead attention from activation #13

Removal of multihead attention from activation #13

dreamer2368 commented Oct 23, 2024

Removal of multihead attention from activation #13

Removal of multihead attention from activation #13

Comments

dreamer2368 commented Oct 23, 2024