📚 Sentence-level Fact-checked Annotated dataset

Hyphen fine-grained explainability evaluation - Annotated dataset release! 💿

This is the first-ever sentence-level fact-checked dataset.

Abstract: Fake news 📰 is often generated by manipulating only a small part of the true information i.e. entities, relations, small parts of a sentence, or a paragraph. It is possible that certain true information is also present in the news piece to make it more appealing to the public, and thus it is crucial to distinguish between true and/or fake parts of a piece of information. Thus, we utilise and release a sentence-level fact-checked annotated dataset. We annotate the Politifact dataset with ground truth evidence corresponding to different parts of the news text, by referring to fact-checking websites Politifact and Gossipcop, and other trustable online sources.

Annotation Process 📝: To evaluate the efficacy of Hyphen in producing suitable explanations, we fact- check and annotate the Politifact dataset on a sentence-level. Each sentence has the following possible labels – true, false, quote, unverified, non_check_worthy or noise.

The annotators were further supposed to arrange the fact-checked sentences in the order of their check-worthiness. We take the help of four expert annotators in the age-group of 25-30 years. The final labels for a sentence were decided on the basis of majority voting amongst the four annotators.
To decide the final rank-list (since different annotators might have different opinions about the level of check-worthiness of the sentences), the fourth annotator compiled the final rank-list by referring to the fact-checked rank- lists by the first three annotators using Kendall’s $\tau$ and Spearman’s $\rho$ rank correlation coefficients, and manually observing the similarities between the three rank-lists.
The compiled list is then cross- checked and re-evaluated by the first three annotators for consistency.

Dataset format ⚅

Every CSV file, politifact-annotation/politifact{news_id}.csv represents the sentences from the news article politifact{news_id}. Every CSV file follows the schema #sample, #sent_id, #sentences, #label.

Label interpretations

The following table lays down the meanings of each label in the annotated dataset:

Label	Explanation
`true`	After verification from online sources, if it can be deduced that a claim being introduced by a sentence is true, then we label it True.
`fake`	After verification from online sources, if it can be deduced that a claim being introduced by a sentence is false, then we label it False.
`non_checkworthy`	If a given sentence is not check-worthy for fake news detection, we label it as not check-worthy,
`quote`	If a given sentence is a quote from someone’s speech/tweet/etc, label it as a quote.
`unverified`	If a situation arises that we are unable to arrive at any conclusion regarding the veracity of a particular sentence (after consulting to all online resources), then we label it as unverified.
`noise`	Owing to scraping errors, it is possible that a sentence in the original scraped dataset would be a noisy one, and in such cases, we label it as noise.

What counts as a reliable source for Fact-checking?

Fact-checking websites like Politifact, Gossipcop and Snopes.
Trusted news providers like BBC News, The Indian Express, Economic Times, Hindustan Times, The Hindu, CNN, The New York Times, Reports from Reuters, etc.
We don’t count Wikipedia as a reliable source.
We don’t count any social media platform like Twitter, Reddit, Facebook, etc, as reliable. However, any tweet/post from a verified social media account would be counted as reliable (i.e. Blue ticks).
Comments on social media posts by verified accounts count as a reliable source of information.

📞 Contact

If you have any questions or issues, please feel free to reach out Karish Grover at karish19471@iiitd.ac.in.

✏️ Citation

If you think that this annotated dataset is helpful, please feel free to leave a star ⭐️ and cite our paper:

@article{grover2022public,
  title={Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social-Text Classification},
  author={Grover, Karish and Angara, SM and Akhtar, Md and Chakraborty, Tanmoy and others},
  journal={arXiv preprint arXiv:2209.13017},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotation.md

annotation.md

📚 Sentence-level Fact-checked Annotated dataset

Hyphen fine-grained explainability evaluation - Annotated dataset release! 💿

Dataset format ⚅

Label interpretations

What counts as a reliable source for Fact-checking?

📞 Contact

✏️ Citation

Files

annotation.md

Latest commit

History

annotation.md

File metadata and controls

📚 Sentence-level Fact-checked Annotated dataset

Hyphen fine-grained explainability evaluation - Annotated dataset release! 💿

Dataset format ⚅

Label interpretations

What counts as a reliable source for Fact-checking?

📞 Contact

✏️ Citation