Skip to content

EdinburghNLP/tramooc-mt

Repository files navigation

TRAMOOC-MT

MT MODULE IN TRAMOOC PROJECT

Information

The TraMOOC MT Server provides the following functionality:

  • Docker files that facilitate the installation of Marian and all requirements, the download and configuration of the TraMOOC translation models, and the launch of a server that serves translations via HTTP.
  • an API that receives and serves requests over HTTP in the XML format.
  • language-specific pre- and postprocessing, including tokenization, truecasing, and subword segmentation.
  • support for segment-level override patterns for text that should not be translated (such as URLs or programming language source code).
  • support for a TMX translation memory.

maintainer: Roman Grundkiewicz [email protected]

version: 3

Installation

on Ubuntu 16.04, the server can be installed natively.

  • install required Ubuntu packages (see Dockerfile for list)

    • if you don't use docker, you might install CUDA and CUDNN manually; choose version compatible with the Docker file, i.e. CUDA 8.0 and CUDNN-dev 5.1.10, which is downloadable from here
  • install required python packages with pip:

    pip install -r requirements.txt --user

  • install MarianNMT:

    make marian

on other Linux systems, the server can be deployed via a Docker container.

Models

If you want to download model, do:

make models

Usage instructions

you can run the local server as follows (for English-German):

./docker-entrypoint.py en-de

you can run the server in a docker container as follows:

nvidia-docker run --rm -p 8080:8080 -v model:/model tramooc/mt_server en-de

a single server can also support multiple languages:

nvidia-docker run --rm -p 8080:8080 -v model:/model tramooc/mt_server en-de en-ru

you can also specify GPU devices which should be used by the server for each language pair; for example, to use GPU with ID 0 and 1 for en-de, and only GPU 1 for en-ru, you should type:

nvidia-docker run --rm -p 8080:8080 -v model:/model tramooc/mt_server en-de:0,1 en-ru:1

If you want to run more than one instance of the server, specify ports for subprocessors:

nvidia-docker run --rm -p 8080:8080 -v model:/model tramooc/mt_server en-de --subproc-port 60000

See ./docker-entrypoint.py --help for other options, which can be also passed to nvidia-docker (at the end of the command line options).

A simple sample client is provided by sample-client.py. sample-client-2.py allows the translation of text passed via standard input.

License

The code in this repository is released under the FreeBSD License.

By default, the tool downloads and uses pre-trained models for 11 language pairs (see below). These models are released for research purposes only.

Supported language pairs

- en-bg (English-Bulgarian)
- en-cs (English-Czech)
- en-de (English-German)
- en-el (English-Greek)
- en-hr (English-Croatian)
- en-it (English-Italian)
- en-nl (English-Dutch)
- en-pl (English-Polish)
- en-pt (English-Portuguese)
- en-ru (English-Russian)
- en-zh (English-Chinese)

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement 644333 (TraMOOC).