-
Experiment - Longformer @Yuetian
-
Combine Longformer with XL (code needs review)
-
Upgrade DDPG to TD3 (code needs review)
-
Block Recurrent Transformer from scratch (code needs review)
-
Unlimiformer from scratch
-
Modify ReplayBuffer for any recurrent states
-
Finish BERT Evaluation for Gallery (Perplexity etc...)
-
Finish Recurrent Training Trainer for any transformer
-
Include stock indicators and time into model inputs (Volume, earnings etc...)
- Transformer XL states can't fit into ReplayBuffer
- Block Recurrent Transformer states can only fit if seqlen is <50
- ReplayBuffer can only save states per block not per timestep
- arXiv Paper: https://arxiv.org/abs/2305.01625
- arXiv Paper: https://arxiv.org/abs/1901.02860
- arXiv Paper: https://arxiv.org/abs/2004.05150
- Longformer: The Long-Document Transformer (Yannic Kilcher): https://www.youtube.com/watch?v=_8KNb5iqblE
- How much memory does Longformer use? (Yannic Kilcher): https://www.youtube.com/watch?v=gJR28onlqzs
- Longformer Blog: https://towardsdatascience.com/longformer-the-long-document-transformer-cdfeefe81e89
- Huggingface: https://huggingface.co/blog/big-bird
- Variants of attention: https://huggingface.co/blog/big-bird
- arXiv Paper: https://arxiv.org/abs/2203.07852
- arXiv Paper: https://arxiv.org/pdf/1802.09477.pdf
- OpenAI Spinning Up: https://spinningup.openai.com/en/latest/algorithms/td3.html
- One Write-Head is All You Need: https://arxiv.org/abs/1911.02150
- Flash Attention: https://arxiv.org/abs/2205.14135
- Diagonal State Space Models for long sequences: https://arxiv.org/abs/2206.11893
- SCROLLS: https://arxiv.org/pdf/2201.03533.pdf
- COLT5: https://arxiv.org/pdf/2303.09752.pdf
- Huggingface Datasets API: https://github.com/huggingface/datasets
- Huggingface Evaluate API: https://github.com/huggingface/evaluate
- Sliding Encoder and Decoder (SLED): https://arxiv.org/pdf/2208.00748.pdf