Skip to content

On Finding Megadiversity Among the Corpus of Scientific Literature

Notifications You must be signed in to change notification settings

sa-aguilarv/thesis_exp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On Finding Megadiversity Among the Corpus of Scientific Literature

Introduction

The objective of this Thesis is to find the most diverse set of scientific papers from a given corpus. The diversity of a set of papers is measured by the number of different topics covered by the set. The set of papers with the highest diversity is called the megadiverse set.

Hence, we address the following problem: given a corpus of scientific papers, find the megadiverse set of papers.

Requirements

  1. OS: Ubuntu LTS 20.04

  2. Download python --latest release is fine

  3. Install miniconda

Set up work environment

  1. Create conda environment with Python 3.8.19

    conda create -n "myenv" python=3.8.19
  2. Activate conda environment

    conda activate myenv
  3. Install requirements

    pip install -r requirements.txt
  4. Download dataset folder from here and add it to the thesis_exp project directory.

  5. Run application with these parameters in this order:

    python main.py --eda
    python main.py --metadata
    python main.py --corpus
    python main.py --eval
    python main.py --lda
    python main.py --umap
    python main.py --entropy
    python main.py --biblio

About

On Finding Megadiversity Among the Corpus of Scientific Literature

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages