This is the first-ever sentence-level fact-checked dataset.
Abstract: Fake news 📰 is often generated by manipulating only a small part of the true information i.e. entities, relations, small parts of a sentence, or a paragraph. It is possible that certain true information is also present in the news piece to make it more appealing to the public, and thus it is crucial to distinguish between true and/or fake parts of a piece of information. Thus, we utilise and release a sentence-level fact-checked annotated dataset. We annotate the Politifact dataset with ground truth evidence corresponding to different parts of the news text, by referring to fact-checking websites Politifact and Gossipcop, and other trustable online sources.
Annotation Process 📝: To evaluate the efficacy of Hyphen
in producing suitable explanations, we fact- check and annotate the Politifact dataset on a sentence-level. Each sentence has the following possible labels – true
, false
, quote
, unverified
, non_check_worthy
or noise
.
- The annotators were further supposed to arrange the fact-checked sentences in the order of their check-worthiness. We take the help of four expert annotators in the age-group of 25-30 years. The final labels for a sentence were decided on the basis of majority voting amongst the four annotators.
- To decide the final rank-list (since different annotators might have different opinions about the level of check-worthiness of the sentences), the fourth annotator compiled the final rank-list by referring to the fact-checked rank- lists by the first three annotators using Kendall’s
$\tau$ and Spearman’s$\rho$ rank correlation coefficients, and manually observing the similarities between the three rank-lists. - The compiled list is then cross- checked and re-evaluated by the first three annotators for consistency.
Every CSV
file, politifact-annotation/politifact{news_id}.csv
represents the sentences from the news article politifact{news_id}
. Every CSV
file follows the schema #sample
, #sent_id
, #sentences
, #label
.
The following table lays down the meanings of each label in the annotated dataset:
Label | Explanation |
---|---|
true |
After verification from online sources, if it can be deduced that a claim being introduced by a sentence is true, then we label it True. |
fake |
After verification from online sources, if it can be deduced that a claim being introduced by a sentence is false, then we label it False. |
non_checkworthy |
If a given sentence is not check-worthy for fake news detection, we label it as not check-worthy, |
quote |
If a given sentence is a quote from someone’s speech/tweet/etc, label it as a quote. |
unverified |
If a situation arises that we are unable to arrive at any conclusion regarding the veracity of a particular sentence (after consulting to all online resources), then we label it as unverified. |
noise |
Owing to scraping errors, it is possible that a sentence in the original scraped dataset would be a noisy one, and in such cases, we label it as noise. |
- Fact-checking websites like Politifact, Gossipcop and Snopes.
- Trusted news providers like BBC News, The Indian Express, Economic Times, Hindustan Times, The Hindu, CNN, The New York Times, Reports from Reuters, etc.
- We don’t count Wikipedia as a reliable source.
- We don’t count any social media platform like Twitter, Reddit, Facebook, etc, as reliable. However, any tweet/post from a verified social media account would be counted as reliable (i.e. Blue ticks).
- Comments on social media posts by verified accounts count as a reliable source of information.
If you have any questions or issues, please feel free to reach out Karish Grover at [email protected].
If you think that this annotated dataset is helpful, please feel free to leave a star ⭐️ and cite our paper:
@article{grover2022public,
title={Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social-Text Classification},
author={Grover, Karish and Angara, SM and Akhtar, Md and Chakraborty, Tanmoy and others},
journal={arXiv preprint arXiv:2209.13017},
year={2022}
}