This repository lets you run the download, training, and evaluation process for the paper "KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents", published in the proceedings of the IEEE International Conference on Machine Learning and Applications 2022. A preprint of the paper is available on arxiv.
First, install all required packages by running pip install /path/to/setup.py, preferably in a virtual environment. NOTE: We used the most up-to-date branch of fluidml for this project, so you likely have to install the branch "run-info-access" from source.
Then, download the data from the EDGAR database with the download_from_edgar.py located under the folder /scripts.
Thereafter, kick of the training pipeline by adding the relevant folder to the config under the folder /config and executing the run_train_pipeline.py located under /scripts.
If you simply want to download the KPI-EDGAR dataset, access the annotations and data from /data. The most up-to-date dataset will always be called kpi_edgar.xlsx.
In /data, there is also a "pre-parsed" json file titled kpi_edgar.json, which includes IOBES tags and might be easier to use for some.
We are maintaining a table with results on our test set. If you want your model listed here, simply send us your results and how you want to be cited.
Model | Relation F1 Score in % | Adjusted Relation F1 Score in % |
---|---|---|
KPI-BERT1 | 22.68 | 43.76 |
SpERT1 | 20.95 | 40.04 |
EDGAR–W2V1 | 6.13 | 19.71 |
GloVe1 | 5.11 | 17.18 |
1Baseline introduced in "KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents".
You can also check PapersWithCode for this dataset.
If you use KPI-EDGAR in your academic work, please cite it directly:
@inproceedings{deusser2022kpiedgar,
author={Deu{\ss}er, Tobias and Ali, Syed Musharraf and Hillebrand, Lars and Nurchalifah, Desiana and Jacob, Basil and Bauckhage, Christian and Sifa, Rafet},
booktitle={Proc. ICMLA},
title={{KPI-EDGAR}: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents},
year={2022},
pages={1654-1659},
doi={10.1109/ICMLA55696.2022.00254}
}