Skip to content
/ Sciatica Public

Sciatica is a powerful semantic search engine designed for academic literature exploration. This tool leverages cutting-edge transformer models to deliver precise and contextually relevant search results.

License

Notifications You must be signed in to change notification settings

hs094/Sciatica

Repository files navigation

SCIATICA

Repository for End term submission for Information Retrieval course (CS60092) offered in Spring semester 2023, Department of CSE, IIT Kharagpur.


Research for research papers



Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Colab Notebooks

About The Project

This project is an attempt of implementing and improving on the work of Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani titled CSFCube - A Test Collection of Computer Science Papers for Faceted Query by Example

The dataset can be found here

The paper describing the dataset can be accessed here

Demo video:

Team members:

  • Ashwani Kumar Kamal - 20CS10011
  • Hardik Pravin Soni - 20CS30023
  • Shiladitya De - 20CS30061
  • Sourabh Soumyakanta Das - 20CS30051

(back to top)

Getting Started

A quick introduction of the minimal setup you need to get the application up

conda env create -f environment.yaml
conda activate sciatica-env
streamlit run deploy.py

Directory Structure

  • Any .ipynb files that need to be run must be placed in this root directory which will contain the /data directory and /Results directory.

  • The data directory contains the CSFCube dataset

.
├── abstracts-csfcube-preds.json
├── abstracts-csfcube-preds.jsonl
├── abstracts-csfcube-preds-no-unicode.jsonl
├── evaluation_splits.json
├── test-pid2anns-csfcube-background.json
├── test-pid2anns-csfcube-method.json
├── test-pid2anns-csfcube-result.json
└── test-pid2pool-csfcube.json
  • The Results directory contains the embeddings generated from the models used
.
├── alberta
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-alberta-background-ranked.json
│   ├── test-pid2pool-csfcube-alberta-method-ranked.json
│   └── test-pid2pool-csfcube-alberta-result-ranked.json
├── allenai_specter
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-allenai_specter-background-ranked.json
│   ├── test-pid2pool-csfcube-allenai_specter-method-ranked.json
│   └── test-pid2pool-csfcube-allenai_specter-result-ranked.json
├── all_mpnet_base_v2
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-background-ranked.json
│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-method-ranked.json
│   └── test-pid2pool-csfcube-all_mpnet_base_v2-result-ranked.json
├── bert_nli
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-bert_nli-background-ranked.json
│   ├── test-pid2pool-csfcube-bert_nli-method-ranked.json
│   └── test-pid2pool-csfcube-bert_nli-result-ranked.json
├── bert_pp
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-bert_pp-background-ranked.json
│   ├── test-pid2pool-csfcube-bert_pp-method-ranked.json
│   └── test-pid2pool-csfcube-bert_pp-result-ranked.json
├── distilbert_nli
│   ├── all.json
│   ├── background.json
│   ├── method.json
│   ├── result.json
│   ├── test-pid2pool-csfcube-distilbert_nli-background-ranked.json
│   ├── test-pid2pool-csfcube-distilbert_nli-method-ranked.json
│   └── test-pid2pool-csfcube-distilbert_nli-result-ranked.json
└── ensemble
    ├── test-pid2pool-csfcube-ensemble-background-ranked.json
    ├── test-pid2pool-csfcube-ensemble-method-ranked.json
    └── test-pid2pool-csfcube-ensemble-result-ranked.json

(back to top)

Colab Notebooks

This notebook contains the code for generating embeddings from the base models. Avoid running it as it takes a long time to run. The embeddings are already provided in the Googe Drive of IR Submission Files.

This is for the fine tuning of the Distilbert model. The results are already present in it. Avoid ruuning it as it takes a long time.

Run each cell of this jupyter notebook and at the second last cell change the queries as per choice and then run both the cells (itself and after it) and it gives the results.

Apart rom all this We are also submitting a zip of the local copies and reports of the .ipynb files which can be run locally. [Note] Please change the file directories strings in the notebooks appropriately to avoid any errors.

(back to top)

About

Sciatica is a powerful semantic search engine designed for academic literature exploration. This tool leverages cutting-edge transformer models to deliver precise and contextually relevant search results.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published