Skip to content

Latest commit

 

History

History
63 lines (36 loc) · 4.39 KB

README.md

File metadata and controls

63 lines (36 loc) · 4.39 KB

clic-gold-standard

This is a manually curated gold standard for quote extraction in literary texts.

It contains a number of randomly selected paragraphs from 15 novels by Charles Dickens and 29 non-Dickensian 19th century novels.

The files are XML files. Quotes are highlighted with <qs/>, <qe/>, <alt-qs/>, <alt-qe/> milestones, respectively shorthand for "quote start", "quote end", "alternative quote start", and "alternative quote end".

Remarks

The difference between <qs/> or <qe/> tags and <alt-qs/> or <alt-qe/> tags has not been manually verified. This means that a tag can mistakenly be identified as an alternative quote even if it is a normal quote. For computing precision and recall this is not an issue if one wants to measure whether quotes (regardless of whether they are alternative) are retrieved.

The gold standard is also annotates suspensions between alternative quotes.

Known issues

Recently solved issues

To do

  • add definitions