Skip to content

Commit

Permalink
training script
Browse files Browse the repository at this point in the history
  • Loading branch information
Kye committed Oct 2, 2023
1 parent b933890 commit 957219f
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ Eprint = {arXiv:2302.05442},

# Todo
- [ ] Add flash attention, with layernorm before attn, and then layernom for qk values,
- [ ] Basic training script on CIFAR,
- [ ] When using ViT-22B, similar to any large scale model, it is difficult to understand how the model arrived at a specific decision, which could lead to lack of
trust and accountability. Add in a mechanism to backtrack
- [ ] create logic to train the decoder for 300k steps with a batch size of 64 using Adam (Kingma and Ba, 2015) and clip the gradients to a global norm value of 0.05 to stabilize training. We linearly increase the learning rate for 2500 steps to 0.0002 (starting from 0) and then decay the learning rate with a cosine schedule (Loshchilov and Hutter, 2017) back to 0.

0 comments on commit 957219f

Please sign in to comment.