this is a submission to babel hack https://babel.tilda.ws

We use language models as initialization for the Transformer network in order to improve MT results on limited parallel data

Our model acheves up to +2 BLEU score on 20k dataset

Warning: This code was created during the hackathon and is in a big need of refactoring

[TODO] -Add links to our paper -Add links to connected research / papers

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
data		data
hp_files		hp_files
lib		lib
models		models
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bleu.py		bleu.py
final_fix.py		final_fix.py
get_emb_snapshot.py		get_emb_snapshot.py
metadata.json		metadata.json
run.sh		run.sh
run_experiment.sh		run_experiment.sh
run_lm.sh		run_lm.sh
run_lm_fused.sh		run_lm_fused.sh
run_train_lm.sh		run_train_lm.sh
split_parallel.py		split_parallel.py
tokenize.sh		tokenize.sh
vocab.py		vocab.py