Skip to content

Latest commit

 

History

History
67 lines (32 loc) · 1.98 KB

Data.md

File metadata and controls

67 lines (32 loc) · 1.98 KB

Summary dataset used

News Summary (.csv) We use file cl_news_summary_more.csv only

AMI corpus (text and summary) (meeting.zip has been uploaded)

WikiHow (.csv)

Data Processing

  1. Download News Summary, WikiHow.

  2. run process.py

python file needs package tensorflow and stanza

Rederence

Summary dataset found

News

News Summary [Summary to a few words headline. Extended, cleaned version ] -> This is what we picked

BBC News Summary [Summary to about 1/3, very long long long]

Paper

scisumm-corpus

TalkSumm

Article

NewsRoom [large scale]

Guideline

WikiHow -> This is what we picked

Meeting

AMI corpus [The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings.] -> This is what we picked

Legal case report

Legal Case [ This one has a text length about 3000 words and short summary, which is ideal ]

Review

Opinosis [ 51 data points ]

Sentence

Sentence-compressed [ Large corpus of uncompressed and compressed sentences from news articles ]

Source

nlp resources

kaggle

process the data