Implementation of a traditional classifier of argumentative components (claims and premises), trained with features/metadata previously extracted from manually annotated argumentative sentences from the citizen proposals available in the Decide Madrid platform.
The complete solution consists of a pipeline of 6 modules, which are in charge of: data extraction from the source database (Decide Madrid), manual annotation of the data using the ARGAEL tool (also supports annotations from Prodigy), the subsequent feature extraction and the final construction and validation of the feature-based classification models.
This work (v1.4) has been accepted as a paper at the 10th Workshop on Argument Mining co-located with the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). A draft of the paper can be found here.
Below are links to all datasets (both intermediate and final) created and used by the solution:
- Decide Madrid platform
- Proposals JSONL files
- Annotations CSV files
- Annotated propositions CSV file
- Features JSON file
- Labeled dataset CSV file
- Models results CSV file
The implemented solutions depend on or make use of the following libraries:
-
Data processor (Python module):
- python v3.9.13
- spaCy v3.3.1
-
Feature extractor (Java module):
- JDK 17
- Stanford CoreNLP v4.5.3
- MongoDB Java Driver v3.12.10
- Snake YAML v1.9
- JSON Java v20210307
-
Argument classifier (Python module):
Created on Aug 18, 2021
Created by:
This project is licensed under the terms of the Apache License 2.0.
This work was supported by the Spanish Ministry of Science and Innovation (PID2019-108965GB-I00).