This repository contains the following materials associated with the StoryLine extraction Task:
- annotated data in CAT-XML format (folder: annotated_data). To visualise the data, you have to use CAT (Content Annotation Tool: http://dh.fbk.eu/resources/cat-content-annotation-tool). Ask for a n account, it's free.
- annotated data in evaluation format, extending PLOT_LINK relations to include coreference relations (folder: evaluation_format)
- test data (folder: evaluation_format/test)
- Python3.* scripts for creating the evaluation format of the data, extracting baselines systems, evaluating baselines'output
The corpus is still growing. Different versions will be made available in this repository as soon as they are ready. Reference papers:
Caselli, T. and P. Vossen. 2016. The Storyline Annotation and Representation Scheme (StaR): A Proposal. In Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016). Held in conjunction with EMNLP 2016 Caselli, T. and P. Vossen. 2017. The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction. In Proceedings of the Events and Stories in the News (EventStory 2017). Held in conjunction with ACL 2017
Experiments reported in Caselli and Vossen 2017 use version 0.9 of the corpus.
Version 1.0 is available.