Welcome to my Natural Language Processing (NLP) diary ^_^.
Transformers
are a type of neural network architecture that allow for parallelization across the sequence. This means that the network can process all of the tokens in the sequence at the same time, rather than having to process them sequentially. This is a huge advantage over RNNs, which must process tokens sequentially.- It was introduced through the paper Attention Is All You Need in 2017 which can be found in the Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin
- Below is a diagram of the Transformer architecture:
- Sebastian Ratchka sums it well here
Fave paper so far:
- This paper presents a compelling case that purported emergent abilities in LLMs are highly dependent on the metrics employed, challenging the community to reassess the foundational understanding of how LLMs evolve with scale.
First favourites in research*****************
- Can large language models (LLMs) train themselves? Credits: Cameron Wolfe found through this twitter thread
- Alignment
- AI Safety (particularly interested in red-teaming)
- "Hallucination" problem
- Interpretability