Skip to content

mishra-kunal1/Attention-is-all-you-need

Repository files navigation

Training of Transformer architecture on a Free GPU for English to German Translation

Implementation of research paper which has the core algorithm (the Transformers architecture) that all LLMs follow and that has been cited more than 100k times “Attention is all you Need” and tried to implement the transformer-based encoder-decoder model on Colab.

This project involves training a Transformer model on Colab GPU to translate English sentences into German. The model is trained on a dataset containing 15,000 pairs of English and German sentences. Dataset link - Download Dataset

colabvideo

TODO

  • Model is overfitting, need to reduce the test error as well.
  • Add inference to predict english to german sentence.
  • Experiment with pretrained tokenizers such as SentencePiece and TikToken.
  • Train the model on parallel GPUs

For beter understanding you can refer to the blogs Blog Link - Intro blog

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published