SPIDer is a framework that decomposes the mutual information contained in a multi-document summary into redundant, synergistic, union, and unique information. We base our implementation on the Partial Information Decomposition (PID) approach defined in A Novel Approach to the Partial Information Decomposition.
Create a Python environment and run the following command to install the required packages:
pip install -r requirements.txt
The class MDSPID
works as follows:
- define the dataset and underlying language model you want to use
- compute the needed sentence probabilities
- run the computation of PIDs (partial information decomposition)
Important: Create the output folders outputs/precomputed
, outputs/preprocessed_data
, and outputs/results
Run:
python run_pid.py --mode run_all --config ../configs/config.yaml
After running run_spider.py
, the MDSPID
instance with the computed PID is stored as a pickle file under outputs/results
. To analyze the results you can use the Jupyter Notebook notebooks/pid_results.ipynb
or the script spider_results.py
.
The code to convert MultiRC into a MDS dataset and get the synergy scores is under notebooks/multiRC
.
@inproceedings{mascarell-2024-which,
title = "Which Information Matters? Dissecting Human-written Multi-document Summaries with Partial Information Decomposition",
author = "Mascarell, Laura and L'Homme, Yan and El Helou, Majed",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
}