Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position
Python implementation of Extractive Multi-document Summarization as described in Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position by Hai Cao Manh, Huong Le Thanh and Tuan Luu Minh.
- Clone this repository.
- Ensure packages are installed using "pip install -r requirements.txt".
We use a dataset DUC2007 that is a famous dataset for text summarization.
We propose an approach to multi-document summarization based on k-means clustering algorithm, combining with centroid-based method, maximal marginal relevance and sentence positions
# Running directly from the repository: (path to save: Data/DUC_2007/folder)
methods/main_method/Kmeans_CentroidBase_MMR_SentencePosition.py --folder_to_save="folder"
Notice: if you get a path error, the following command may be helpful:
# Running directly from the repository:
export PYTHONPATH=.
Please replace "test" in the "system_folder" variable with system folder (ex. "folder").
# Running directly from the repository:
rouge/pyrouge_DUC_2007.py
Notice: if you get an error, you can try running the source code directly with the Pycharm IDE.
# Running directly from the repository:
test.py --cluster="cluster" --number_sentence_with_centroid="number_sentence_with_centroid" --number_sentence_with_mmr="number_sentence_with_mmr" --path_to_data="path_to_folder"