Implementation of research paper which has the core algorithm (the Transformers architecture) that all LLMs follow and that has been cited more than 100k times “Attention is all you Need” and tried to implement the transformer-based encoder-decoder model on Colab.
This project involves training a Transformer model on Colab GPU to translate English sentences into German. The model is trained on a dataset containing 15,000 pairs of English and German sentences. Dataset link - Download Dataset
- Model is overfitting, need to reduce the test error as well.
- Add inference to predict english to german sentence.
- Experiment with pretrained tokenizers such as SentencePiece and TikToken.
- Train the model on parallel GPUs
For beter understanding you can refer to the blogs Blog Link - Intro blog