- Features
- Installation
- Usage
- Low Precision Inference
- Converting Models from Fairseq
- A Model Zoo
- Papers
- Team Members
NiuTrans.NMT is a lightweight and efficient Transformer-based neural machine translation system. Its main features are:
- Few dependencies. It is implemented with pure C++, and all dependencies are optional.
- Fast decoding. It supports various decoding acceleration strategies, such as batch pruning and dynamic batch size.
- Advanced NMT models, such as Deep Transformer.
- Flexible running modes. The system can be run on various systems and devices (Linux vs. Windows, CPUs vs. GPUs, and FP32 vs. FP16, etc.).
- Framework agnostic. It supports various models trained with other tools, e.g., fairseq models.
- The code is simple and friendly to beginners.
-
OS: Linux or Windows
-
GCC/G++ >=4.8.4 (on Linux)
-
VC++ >=2015 (on Windows)
-
CMake >= 2.8
-
CUDA >= 9.2, <= 10.0 (optional)
-
MKL latest version (optional)
-
OpenBLAS latest version (optional)
The default configuration enables compiling for the pure CPU version.
# Download the code
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
# Merge with NiuTrans.Tensor
mv NiuTrans.Tensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
# Run CMake
cmake ..
You can add compilation options to the CMake command to support accelerations with MKL, OpenBLAS, or CUDA.
Please note that you can only select at most one of MKL or OpenBLAS.
-
Use CUDA (required for training)
Add
-DUSE_CUDA=ON
and-DCUDA_TOOLKIT_ROOT_DIR=$CUDA_PATH
to the CMake command, where$CUDA_PATH
is the path of the CUDA toolkit.You can also add
-DUSE_FP16=ON
to the CMake command to get half-precision supported. -
Use MKL (optional)
Add
-DUSE_MKL=ON
and-DINTEL_ROOT=$MKL_PATH
to the CMake command, where$MKL_PATH
is the path of MKL. -
Use OpenBLAS (optional)
Add
-DUSE_OPENBLAS=ON
and-DOPENBLAS_ROOT=$OPENBLAS_PATH
to the CMake command, where$OPENBLAS_PATH
is the path of OpenBLAS.
Note that half-precision requires Pascal or newer architectures on GPUs.
We provide several examples to build the project with different options.
make -j && cd ..
Add -A 64
to the CMake command and it will generate a visual studio project on windows, i.e., NiuTrans.NMT.sln
so you can open & build it with Visual Studio (>= Visual Studio 2015).
If it succeeds, you will get an executable file NiuTrans.NMT
in the 'bin' directory.
Make sure compiling the program with CUDA because training on CPUs is not supported now.
Step 1: Prepare the training data.
# Convert the BPE vocabulary
python3 tools/GetVocab.py \
-raw $bpeVocab \
-new $niutransVocab
Description:
raw
- Path of the BPE vocabulary.new
- Path of the NiuTrans.NMT vocabulary to be saved.
# Binarize the training data
python3 tools/PrepareParallelData.py \
-src $srcFile \
-tgt $tgtFile \
-src_vocab $srcVocab \
-tgt_vocab $tgtVocab \
-output $trainingFile
Description:
src
- Path of the source language data. One sentence per line with tokens separated by spaces or tabs.tgt
- Path of the target language data. The same format as the source language data.sv
- Path of the source language vocabulary. Its first line is the vocabulary size and the first index, followed by a word and its index in each following line.tv
- Path of the target language vocabulary. The same format as the source language vocabulary.output
- Path of the training data to be saved.
Step 2: Train the model
bin/NiuTrans.NMT \
-dev $deviceID \
-model $modelFile \
-train $trainingData \
-valid $validData
Description:
dev
- Device id (>= 0 for GPUs). Default: 0.model
- Path of the model to be saved.train
- Path to the training file. The same format as the output file in step 1.valid
- Path to the validation file. The same format as the output file in step 1.wbatch
- Word batch size. Default: 4096.sbatch
- Sentence batch size. Default: 8.mt
- Indicates whether the model runs for machine translation. Default: true.dropout
- Dropout rate for the model. Default: 0.3.fnndrop
- Dropout rate for fnn layers. Default: 0.1.attdrop
- Dropout rate for attention layers. Default: 0.1.lrate
- Learning rate. Default: 0.0015.lrbias
- The parameter that controls the maximum learning rate in training. Default: 0.nepoch
- Training epoch number. Default: 50.nstep
- Traing step number. Default: 100000.nwarmup
- Step number of warm-up for training. Default: 8000.adam
- Indicates whether Adam is used. Default: true.adambeta1
- Hyper parameters of Adam. Default: 0.9.adambeta2
- Hyper parameters of Adam. Default: 0.98.adambeta
- Hyper parameters of Adam. Default: 1e-9.shuffled
- Indicates whether the data file is shuffled for training. Default: true.labelsmoothing
- Label smoothing factor. Default: 0.1.nstepcheckpoint
- Number of steps after which we make a checkpoint. Default: -1.epochcheckpoint
- Indicates whether we make a checkpoint after each training epoch. Default: true.updatestep
- Number of batches that we collect for model update. Default: 1 (one can set > 1 for gradient accumulation).sorted
- Indicates whether the sequence is sorted by length. Default: false.bufsize
- Buffer size for the batch loader. Default: 50000.doubledend
- Indicates whether we double the symbol for the output of LM. Default: false.smallbatch
- Indicates whether we use batchsize = max * sc rather rather than batchsize = word-number, where max is the maximum length and sc is the sentence number. Default: true.bigbatch
- Counterpart of "isSmallBatch". Default: false.randbatch
- Randomize batches. Default: false.bucketsize
- Bucket size for the batch loader. Default: wbatch * 10.
Refer to this page for the training example.
Make sure compiling the program with CUDA and FP16 if you want to translate with FP16 on GPUs.
bin/NiuTrans.NMT \
-dev $deviceID \
-test $inputFile \
-model $modelPath \
-sbatch $batchSize \
-beamsize $beamSize \
-srcvocab $srcVocab \
-tgtvocab $tgtVocab \
-output $outputFile
Description:
model
- Path of the model.sbatch
- Sentence batch size. Default: 8.dev
- Device id (-1 for CPUs, and >= 0 for GPUs). Default: 0.beamsize
- Size of the beam. 1 for the greedy search.test
- Path of the input file. One sentence per line with tokens separated by spaces.output
- Path of the output file to be saved. The same format as the input file.srcvocab
- Path of the source language vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.tgtvocab
- Path of the target language vocabulary. The same format as the source language vocabulary.fp16 (optional)
- Inference with FP16. This will not work if the model is stored in FP32. Default: false.lenalpha
- The alpha parameter controls the length preference. Default: 0.6.maxlenalpha
- Scalar of the input sequence (for the max number of search steps). Default: 1.2.
Refer to this page for the translating example.
NiuTrans.NMT supports inference with FP16, you can convert the model to FP16 with our tools:
python3 tools/FormatConverter.py \
-input $inputModel \
-output $outputModel \
-format $targetFormat
Description:
input
- Path of the raw model file.output
- Path of the new model file.format
- Target storage format, FP16 (Default) or FP32.
The core implementation is framework agnostic, so we can easily convert models trained with other frameworks to a binary format for efficient inference.
The following frameworks and models are currently supported:
fairseq (0.6.2) | |
---|---|
Transformer (Vaswani et al. 2017) | ✓ |
RPR attention (Shaw et al. 2018) | ✓ |
Deep Transformer (Wang et al. 2019) | ✓ |
Refer to this page for the details about training models with fairseq.
After training, you can convert the fairseq models and vocabulary with the following steps.
Step 1: Convert parameters of a single fairseq model
python3 tools/ModelConverter.py -src $src -tgt $tgt
Description:
src
- Path of the fairseq checkpoint, refer to this for more details.tgt
- Path to save the converted model parameters. All parameters are stored in a binary format.fp16 (optional)
- Save the parameters with 16-bit data type. Default: disabled.
Step 2: Convert the vocabulary:
python3 tools/VocabConverter.py -src $fairseqVocabPath -tgt $newVocabPath
Description:
src
- Path of the fairseq vocabulary, refer to this for more details.tgt
- Path to save the converted vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.
You may need to convert both the source language vocabulary and the target language vocabulary if they are not shared.
We provide several pre-trained models to test the system. All models and runnable systems are packaged into docker files so that one can easily reproduce our result.
Refer to this page for more details.
Here are the papers related to this project:
Learning Deep Transformer Models for Machine Translation. Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao. 2019. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
The NiuTrans System for WNGT 2020 Efficiency Task. Chi Hu, Bei Li, Yinqiao Li, Ye Lin, Yanyang Li, Chenglong Wang, Tong Xiao, Jingbo Zhu. 2020. Proceedings of the Fourth Workshop on Neural Generation and Translation.
This project is maintained by a joint team from NiuTrans Research and NEU NLP Lab. Current team members are
Chi Hu, Bei Li, Yinqiao Li, Ye Lin, Quan Du, Tong Xiao and Jingbo Zhu
Please contact [email protected] if you have any questions.