Training of Transformer architecture on a Free GPU for English to German Translation

Implementation of research paper which has the core algorithm (the Transformers architecture) that all LLMs follow and that has been cited more than 100k times “Attention is all you Need” and tried to implement the transformer-based encoder-decoder model on Colab.

This project involves training a Transformer model on Colab GPU to translate English sentences into German. The model is trained on a dataset containing 15,000 pairs of English and German sentences. Dataset link - Download Dataset

TODO

Model is overfitting, need to reduce the test error as well.
Add inference to predict english to german sentence.
Experiment with pretrained tokenizers such as SentencePiece and TikToken.
Train the model on parallel GPUs

For beter understanding you can refer to the blogs Blog Link - Intro blog

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.DS_Store		.DS_Store
Attention_is_all_you_need.ipynb		Attention_is_all_you_need.ipynb
README.md		README.md
dataloader.py		dataloader.py
decoder.py		decoder.py
encoder.py		encoder.py
main.py		main.py
multihead_attention.py		multihead_attention.py
token_position_embeddings.py		token_position_embeddings.py
tokenizer.py		tokenizer.py
transformer_model.py		transformer_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training of Transformer architecture on a Free GPU for English to German Translation

TODO

About

Releases

Packages

Languages

mishra-kunal1/Attention-is-all-you-need

Folders and files

Latest commit

History

Repository files navigation

Training of Transformer architecture on a Free GPU for English to German Translation

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages