Skip to content

Commit

Permalink
Add suggestions.
Browse files Browse the repository at this point in the history
  • Loading branch information
samuelburbulla committed Apr 22, 2024
1 parent e3ca605 commit c6683b5
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions src/continuiti/networks/multi_head_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@
class MultiHeadAttention(nn.Module):
r"""Multi-Head Attention module.
Module as described in the paper "Attention is All you Need"
(https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) with
optional bias for the projections.
Module as described in the paper [Attention is All you Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)
with optional bias for the projections.
$$MultiHead(Q,K,V)=Concat(head_1,\dots,head_n)W^O + b^O$$
where
$$head_i=Attention(QW_i^Q+b_i^Q, KW_i^K+b_i^K, VW_i^V+b_i^V)$$.
$$head_i=Attention(QW_i^Q+b_i^Q, KW_i^K+b_i^K, VW_i^V+b_i^V).$$
Args:
hidden_dim: dimension of the hidden layers (embedding dimension).
Expand Down

0 comments on commit c6683b5

Please sign in to comment.