Flax implementation of gMLP from "Pay Attention to MLPs" #1410
-
It's no news that transformers have dominated the field of deep learning ever since 2017. But, Hanxiao Liu, Zihang Dai, David R. So and Quoc V. Le in their recent work titled "Pay Attention to MLPs" propose a new architecture gMLP (essentially MLPs with gating) that performs as well as Transformers in key language and vision applications. Based on the comparisons shown in the paper the authors show that self-attention is not critical for Vision Transformers !!, as gMLP can achieve the same accuracy, thus bringing into question the validity of Attention. My repository includes an implementation of gMLP written in Flax. Most of the codebase is inspired from Phil Wang's implementations in Pytorch and Haiku. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Awesome, thanks for sharing! |
Beta Was this translation helpful? Give feedback.
-
Marking as answered |
Beta Was this translation helpful? Give feedback.
Marking as answered