Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
metrics.py		metrics.py
mlsum_tr.yaml		mlsum_tr.yaml

README.md

MLSUM

Paper

Title: MLSUM: The Multilingual Summarization Corpus

Abstract: https://aclanthology.org/2020.emnlp-main.647/

We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages – namely, French, German, Spanish, Russian, Turkish. Together with English news articles from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.

Homepage: https://huggingface.co/datasets/mlsum

Citation

@inproceedings{scialom-etal-2020-mlsum,
    title = "{MLSUM}: The Multilingual Summarization Corpus",
    author = "Scialom, Thomas  and
      Dray, Paul-Alexis  and
      Lamprier, Sylvain  and
      Piwowarski, Benjamin  and
      Staiano, Jacopo",
    editor = "Webber, Bonnie  and
      Cohn, Trevor  and
      He, Yulan  and
      Liu, Yang",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.647",
    doi = "10.18653/v1/2020.emnlp-main.647",
    pages = "8051--8067",
    abstract = "We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages {--} namely, French, German, Spanish, Russian, Turkish. Together with English news articles from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlsum

mlsum

README.md

MLSUM

Paper

Citation

Files

mlsum

Directory actions

More options

Directory actions

More options

Latest commit

History

mlsum

Folders and files

parent directory

README.md

MLSUM

Paper

Citation