This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:
Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version
-
python3.6
-
install requirements
> pip install -r requirements.txt
-
pull docker
> docker pull allenlao/pytorch-mt-dnn:v0.1
-
run docker
> docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v0.1 bash
Please refere the following link if you first use docker: https://docs.docker.com/
-
download data
> sh download.sh
Please refer to download GLUE dataset: https://gluebenchmark.com/ -
preprocess data
> python prepro.py
-
training
> python train.py
Note that we ran experiments on 4 V100 GPUs for base mt-dnn models. You may need to reduce batch size for other GPUs.
-
MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.- Preprocess GLUE data via the aforementioned script
- Training:
>scripts\run_mt_dnn.sh
-
Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
Here, we preovide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.- Finetune on the STS-B task
> scripts\run_stsb.sh
You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation. - Finetune on the RTE task
> scripts\run_rte.sh
You should get about 83.8 on RTE dev in terms of accuracy.
- Finetune on the STS-B task
###SciTail & SNIL Result reproduce (Domain Adaptation)
-
Domain Adaptation on SciTail
>scripts\scitail_domain_adaptation_bash.sh
-
Domain Adaptation on SNLI
>scripts\snli_domain_adaptation_bash.sh
BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT : https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc
###How do I cite MT-DNN?
For now, please cite arXiv version:
@article{liu2019mt-dnn,
title={Multi-Task Deep Neural Networks for Natural Language Understanding},
author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
journal={arXiv preprint arXiv:1901.11504},
year={2019}
}
and a new version of the paper will be shared later.
Typo: there is no activation fuction in Equation 2.
For help or issues using MT-DNN, please submit a GitHub issue.
For personal communication related to MT-DNN, please contact Xiaodong Liu ([email protected]
), Pengcheng He ([email protected]
), Weizhu Chen ([email protected]
) or Jianfeng Gao ([email protected]
).