BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

https://arxiv.org/abs/1910.13461

Introduction

BART is sequence-to-sequence model trained with denoising as pretraining objective. We show that this pretraining objective is more generic and show that we can match RoBERTa results on SQuAD and GLUE and gain state-of-the-art results on summarization (XSum, CNN dataset), long form generative question answering (ELI5) and dialog response genration (ConvAI2). See the associated paper for more details.

Download

Pre-trained models

Model	Description	# params	Download
`bart.large.cnn`	`bart.large` finetuned on `CNN-DM`	400M	bart.large.cnn.tar.gz

CNNDM dataset

wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm_v2.tgz

Installation

StandfordNLP

Download and unzip CoreNLP 4.2.0
Download model jar English 4.2.0

Move the jar to the distribution directory.

mv /path/to/stanford-corenlp-4.2.0-models-french.jar /path/to/stanford-corenlp-4.2.0

Include the distribution directory in your CLASSPATH.

export CLASSPATH=$CLASSPATH:/path/to/stanford-corenlp-4.2.0/*:

files2rouge

For calculating rouge, install files2rouge from here. Make sure to use pltrdy/pyrouge, if not, some errors will occur. Or if you are using bheinzerling/pyrouge, you can comment out the second and third parameters in files2rouge.py before setting up.

fairseq

pip install fairseq

Implementation

Summarize

In fairseq, summaries can be generated using:

python summarize.py \
  --model-dir bart.large.cnn.tar.gz \
  --model-file model.pt \
  --src cnn_cln/test.source \
  --out cnn_cln/test.hypo

Evaluate

sh evaluate.sh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
files2rouge		files2rouge
README.glue.md		README.glue.md
README.md		README.md
README.summarization.md		README.summarization.md
bart.large.cnn.tar.gz		bart.large.cnn.tar.gz
cnn_cln		cnn_cln
evaluate.sh		evaluate.sh
summarize.py		summarize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Introduction

Download

Pre-trained models

CNNDM dataset

Installation

StandfordNLP

files2rouge

fairseq

Implementation

Summarize

Evaluate

About

Releases

Packages

CarolLi/fairseqBart

Folders and files

Latest commit

History

Repository files navigation

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Introduction

Download

Pre-trained models

CNNDM dataset

Installation

StandfordNLP

files2rouge

fairseq

Implementation

Summarize

Evaluate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages