QuidditchChampions (Team B) In-class Machine Translation Shared Task 2017 Submission

This repository is a result of our participation in the shared task.
We went through the process of building, analyzing, and improving the neural machine translation system.

Poster: link

The shared task was for Estonian-English language pair. It included working with ~19.000.000 sentence pairs.

Shared task main page: link
Shared task on course page: link

Sections below summarize key milestones we went through.

Baseline system

Our baseline system was a default OpenNMT-py model with 2-layers of 500 LSTM hidden units for both encoder and decoder using 30k BPE vocabulary.
As a result, we got 21.95 BLEU points on the shared dev set.

More details: report1

Baseline system manual evaluation

We manually analyzed 60 baseline translations.
Or main observation was that a lot of sentences lacked fluency. Often in a long sentence a part of the sentence lacked fluency or was completely nonsensical.
Take a look at the motivating example 1 produced by baseline system:
1. Human: "The biggest forest owners ( state , local governments and some private forestry companies , owning thousands of hectares of forest areas ) can ensure a continuous process of production throughout the long forest management cycle ."
2. Baseline: "The largest forest owners ( the country , local authorities and some of the private sector companies to whom thousands of hectares of forest land ) can be guaranteed throughout the long term management cycle ."
Example 2:
1. Human: "The European Union is set up with the aim of ending the frequent and bloody wars between neighbours , which culminated in the Second World War ."
2. Baseline: "The European Union was created to end the frequent bloody wars of the neighbours , which became the Second World War ."

More details: report2

Final system

In order to address translation issues found after our manual evaluation we used Amazons sockeye library to train a system using context gates and instead of attention we used coverage, our bpe vocabulary size was 30k. For translation we used beam size 10.
The trained system gave us 22.89 BLEU points on the shared dev set that means SMALL increase over the baseline.

More details: report3 and report4

Final system manual evaluation

Generally speaking, the majority of sentences are fluent and meaning preserving. Especially long sentences are translated much better than with the baseline model.
Let's look at example 1 where the fluency was greatly improved:
1. Final system: "The largest forest owners ( country , local authorities and some private forestry companies with thousands of hectares of forest areas ) can ensure a continuous production process throughout the long forest management cycle ."
As a result you can see that this sentence is completely fluent and adequate. It is a great improvement compared to the baseline model.
In example 2 the fluency was also greatly improve:
1. Final system: "The European Union was set up to put an end to the frequent bloody wars between neighbours , the culmination of which became the Second World War ."
Here you can see that although the sentence structure is changed, it is completely fluent and adequate.

Do not forget to check our poster: Poster

What we also tried or wanted to try

We also tried replacing all dots except last with special symbols and various beam sizes.
The dot replacement gave 22.29 BLEU points on shared dev set and actually helped with translations. Below is a translation with this approach.

Baseline: This part of our website will find information on how Parliament will organise its work through the various committees .
Dot-model: This section of our website will find information on how Parliament operates its work through a system of various committees , and the work of the European Parliament is therefore important because decisions on new European laws are jointly made by the Parliament and the Council of Ministers .

Also we tried 40k vocabulary for coverage + context to tackle some words not translated correctly (bigger voc should cover more words), however the results got worse based on manual evaluation and BLEU, last was only 21.31.

Finally we tried different beam sizes for translation. Bigger beam size gave slightly better results based on manual evaluation and BLEU, which also increased by little.

Lastly, we wanted to try hyperparameter tuning, however the model did not converge. There were too many hyperparameter to tune to really find out which value suits what parameter is good. Furthermore we wanted to try POS-tags and ensembling multiple models.

Final words

On final test set we got 25.66 BLEU score. The translations were mostly quite fluent and adequate, nevertheless sometimes the meaning got lost, some words were repeated or there were mistranslations. Example : "China has just refused the sale of human organs and restricting the possibility of obtaining sirens from foreigners." (ID: 250). Instead of "sirens" there should be "transplants", otherwise great translation.
We had issues with models training rather long - especially for OpenNMT, Sockeye was much faster. Queue times were sometimes really long, especially at the ending part of semester.
We learnt that in order to train great model, it needs much analysing, trying, evaluating.

Team members:

Project board: link

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
OpenNMT-py		OpenNMT-py
data		data
hyps		hyps
models		models
reports		reports
slurm-scripts		slurm-scripts
test-final		test-final
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuidditchChampions (Team B) In-class Machine Translation Shared Task 2017 Submission

Baseline system

Baseline system manual evaluation

Final system

Final system manual evaluation

What we also tried or wanted to try

Final words

Team members:

About

Releases

Packages

Contributors 4

Languages

mt2017-tartu-shared-task/nmt-system-B

Folders and files

Latest commit

History

Repository files navigation

QuidditchChampions (Team B) In-class Machine Translation Shared Task 2017 Submission

Baseline system

Baseline system manual evaluation

Final system

Final system manual evaluation

What we also tried or wanted to try

Final words

Team members:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages