Concerns About Vaccines with Explanations and Summaries (CAVES)

This repository contains the datasets, corresponding to the paper titled "CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines", which was accepted at ACM SIGIR 2022 (Resource Track). A preprint version is available on: arXiv.

NOTE: The dataset was updated since the publication of the paper, details have been updated in the preprint version

Data Description

The "gold_summaries" folder contains summaries of each of the classes by 3 different annotators. The "labelled_tweets" folder contains the labels and tweet IDs in standard CSV format, and the label-explanation tuples in standard JSON format. The "start" and "end" indices in the explanations represent the index of the corresponding tokens in the text when split just by whitespaces.

For example, for a start and end index of 2 and 6, the explanation for the tweet: "They are making huge $$$ profits ! won't take it!" will be "making huge $$$ profits".

For queries please mail me at: Email ID

If you use our data, please cite the following paper:

@inproceedings{poddar2022caves,
  title={CAVES: A dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines},
  author={Poddar, Soham and Samad, Azlaan Mustafa and Mukherjee, Rajdeep and Ganguly, Niloy and Ghosh, Saptarshi},
  booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2022}
}

Classification Models on the Dataset

MuLX-QA is method that identifies multiple label-explanation tuples from social media posts. This method was accepted for the ACM Transactions on the Web (TWEB) Journal in 2024. Link to paper Link to Github
Cov-Gen is a method that uses a flan-T5 model to accurately classify the vaccine concerns (multi-labels). This method is part of the paper "How COVID-19 has Impacted the Anti-Vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era" accepted at 18th International AAAI Conference on Web and Social Media (ICWSM). Link to paper Link to Github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Concerns About Vaccines with Explanations and Summaries (CAVES)

Data Description

Classification Models on the Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Concerns About Vaccines with Explanations and Summaries (CAVES)

Data Description

Classification Models on the Dataset