You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create the labs guide for the Transformers day under pages/transformers/. This should include
Introduction and explanation of the Transformer architecture (both encoder and decoder)
Add a draft of Transformer formulation (feel free to modify) @ramon-astudillo (expected today)
Check other days as reference
Explanation in detail of the attention mechanism. Maybe include some plots of attention, in particular causal attention. Attention is the most important part of Transformers so it's worth expanding this in detail. Maybe positional embeddings also deserve some love.
The day will be centered in the decoder and we can leave the encoder-decider as a final section with no exercises for this year.
Create the labs guide for the Transformers day under
pages/transformers/
. This should includeIntroduction and explanation of the Transformer architecture (both encoder and decoder)
Explanation in detail of the attention mechanism. Maybe include some plots of attention, in particular causal attention. Attention is the most important part of Transformers so it's worth expanding this in detail. Maybe positional embeddings also deserve some love.
The day will be centered in the decoder and we can leave the encoder-decider as a final section with no exercises for this year.
Exercise code blocks copying the code from Transformer Day: Create Transformer Exercises lxmls-toolkit#178 (see other days as example)
Explanations (Context for the exercises)
More advanced information: Open to suggestions @tmynn @lhaausing . If we complete fine-tuning in Transformer Day: Get miniGPT up and running lxmls-toolkit#177. We could add some explanations about this (instruction tuning?)
branch: https://github.com/LxMLS/lxmls-guide/tree/transformer-day
Expected Finishing date:
The text was updated successfully, but these errors were encountered: