Repository for End term submission for Information Retrieval course (CS60092) offered in Spring semester 2023, Department of CSE, IIT Kharagpur.
Research for research papers
Report Bug
·
Request Feature
Table of Contents
This project is an attempt of implementing and improving on the work of Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani titled CSFCube - A Test Collection of Computer Science Papers for Faceted Query by Example
The dataset can be found here
The paper describing the dataset can be accessed here
Demo video:
Team members:
- Ashwani Kumar Kamal - 20CS10011
- Hardik Pravin Soni - 20CS30023
- Shiladitya De - 20CS30061
- Sourabh Soumyakanta Das - 20CS30051
A quick introduction of the minimal setup you need to get the application up
conda env create -f environment.yaml
conda activate sciatica-env
streamlit run deploy.py
-
Any
.ipynb
files that need to be run must be placed in this root directory which will contain the/data
directory and/Results
directory. -
The
data
directory contains the CSFCube dataset
.
├── abstracts-csfcube-preds.json
├── abstracts-csfcube-preds.jsonl
├── abstracts-csfcube-preds-no-unicode.jsonl
├── evaluation_splits.json
├── test-pid2anns-csfcube-background.json
├── test-pid2anns-csfcube-method.json
├── test-pid2anns-csfcube-result.json
└── test-pid2pool-csfcube.json
- The
Results
directory contains the embeddings generated from the models used
.
├── alberta
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-alberta-background-ranked.json
│ ├── test-pid2pool-csfcube-alberta-method-ranked.json
│ └── test-pid2pool-csfcube-alberta-result-ranked.json
├── allenai_specter
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-allenai_specter-background-ranked.json
│ ├── test-pid2pool-csfcube-allenai_specter-method-ranked.json
│ └── test-pid2pool-csfcube-allenai_specter-result-ranked.json
├── all_mpnet_base_v2
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-all_mpnet_base_v2-background-ranked.json
│ ├── test-pid2pool-csfcube-all_mpnet_base_v2-method-ranked.json
│ └── test-pid2pool-csfcube-all_mpnet_base_v2-result-ranked.json
├── bert_nli
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-bert_nli-background-ranked.json
│ ├── test-pid2pool-csfcube-bert_nli-method-ranked.json
│ └── test-pid2pool-csfcube-bert_nli-result-ranked.json
├── bert_pp
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-bert_pp-background-ranked.json
│ ├── test-pid2pool-csfcube-bert_pp-method-ranked.json
│ └── test-pid2pool-csfcube-bert_pp-result-ranked.json
├── distilbert_nli
│ ├── all.json
│ ├── background.json
│ ├── method.json
│ ├── result.json
│ ├── test-pid2pool-csfcube-distilbert_nli-background-ranked.json
│ ├── test-pid2pool-csfcube-distilbert_nli-method-ranked.json
│ └── test-pid2pool-csfcube-distilbert_nli-result-ranked.json
└── ensemble
├── test-pid2pool-csfcube-ensemble-background-ranked.json
├── test-pid2pool-csfcube-ensemble-method-ranked.json
└── test-pid2pool-csfcube-ensemble-result-ranked.json
This notebook contains the code for generating embeddings from the base models. Avoid running it as it takes a long time to run. The embeddings are already provided in the Googe Drive of IR Submission Files.
This is for the fine tuning of the Distilbert model. The results are already present in it. Avoid ruuning it as it takes a long time.
Run each cell of this jupyter notebook and at the second last cell change the queries as per choice and then run both the cells (itself and after it) and it gives the results.
Apart rom all this We are also submitting a zip of the local copies and reports of the .ipynb files which can be run locally. [Note] Please change the file directories strings in the notebooks appropriately to avoid any errors.