Skip to content

athenarc/bip-citation-classifier

Repository files navigation

BIP! Citation Classifier

The BIP! Citation Classifier is a comprehensive Python library designed to classify citations based on their intent, utilizing a range of state-of-the-art algorithms. This tool utilises the citation context found within scientific publications, analysing the text surrounding a reference to determine the intent (or purpose) behind the citation. By leveraging a well-established citation classification ontology, the library categorises citations into specific classes, such as whether a citation supports, uses, or extends the work being cited. The outputs of the BIP! Citation Classifier are particularly useful for tasks such as citation network analysis, where understanding the nature of each citation can significantly improve the accuracy of various analyses.

Citation Classifiers in RPIs Calculation for Scientometrics

This project implements various text mining techniques based on neural networks, focusing on citation intent classification at different semantic levels. It also includes the modification and calculation of Relative Performance Indicators (RPIs) to observe how they are influenced by citation intent. All code is run on Google Colab.

Project Structure

Part I: Zero-Shot Classification Models

Six folders need to be created to store datasets, notebooks, and results of the Zero-Shot Classification Models.

  1. Folder 1 – ACT:

Dataset: datasets/ACL_ATC/ATC/train.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_ACL_ATC_Classes6.ipynb

Contents: Inference results from four ZeroShotClassification models will be stored as .csv files. Model performance will be documented in the notebook.

  1. Folder 2 – ACT_INFLUENCE:

Dataset: datasets/ACL_ATC/ATC_INFLUENCE/train.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_ACL_ATC_Classes2.ipynb

Contents: Inference results from four ZeroShotClassification models stored as .csv files. Model performance will be documented in the notebook.

  1. Folder 3 – SciCite_Model1:

Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model1.ipynb

Contents: Model1 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.

  1. Folder 4 – SciCite_Model2:

Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model2.ipynb

Contents: Model2 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.

  1. Folder 5 – SciCite_Model3:

Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model3.ipynb

Contents: Model3 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.

  1. Folder 6 – SciCite_Model4:

Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv

Notebook: Inference_ZeroShotClassification/ZeroShotClassification_SciCite_model4.ipynb

Contents: Model4 inference results stored as .csv files for train, dev, and test datasets. Model performance will be presented in the notebook.


Part II: SciBERT Model Reproduction

The SciBERT model is reproduced using PyTorch, following the study’s guidelines. This includes citation intent classification with three and four labeled classes to evaluate the model's ability to capture more granular semantic intent.

SciBERT Model - 3 Classes (Background/Method/Result):

Folder Name: SciBERT_classes3

Datasets: datasets/SciCite/ATC/train.csv datasets/SciCite/ATC/dev.csv datasets/SciCite/ATC/test.csv

Notebook: SciBERT_Reproduction/SciBERT_Reproduction_3Classes.ipynb

Contents: Model checkpoints will be stored to allow the best model to be used for inference after validation.

SciBERT Model - 4 Classes (Background/Method/Result_Supportive/Result_Not_Supportive):

Folder Name: SciBERT_classes4

Datasets: datasets/SciCite/train.csv datasets/SciCite/dev.csv datasets/SciCite/test.csv

Notebook: SciBERT_Reproduction/SciBERT_4Classes.ipynb

Contents: Model checkpoints will be stored for inference after validation.


Part III: RPI Calculation Based on Citation Intent Semantics

The RPIs (Relative Performance Indicators) will be calculated based on the citation intent semantics.

Folder Name: RPIs

Datasets: datasets/SciCite/train.csv datasets/SciCite/dev.csv datasets/SciCite/test.csv

Notebook: Citation_Intent_in_RPIs/RPIs.ipynb Contents: This notebook will calculate RPIs based on the semantics of citation intent.


How to Run

To replicate the results:

Clone this repository.

Download the datasets from the mentioned paths.

Run the notebooks in Google Colab using VG100 GPU or your local environment.

Make sure to install all required dependencies listed in each notebook before running them.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published