On Finding Megadiversity Among the Corpus of Scientific Literature

Introduction

The objective of this Thesis is to find the most diverse set of scientific papers from a given corpus. The diversity of a set of papers is measured by the number of different topics covered by the set. The set of papers with the highest diversity is called the megadiverse set.

Hence, we address the following problem: given a corpus of scientific papers, find the megadiverse set of papers.

Requirements

OS: Ubuntu LTS 20.04
Download python --latest release is fine
Install miniconda

Set up work environment

Create conda environment with Python 3.8.19
```
conda create -n "myenv" python=3.8.19
```
Activate conda environment
```
conda activate myenv
```
Install requirements
```
pip install -r requirements.txt
```
Download dataset folder from here and add it to the thesis_exp project directory.

Run application with these parameters in this order:

python main.py --eda

python main.py --metadata

python main.py --corpus

python main.py --eval

python main.py --lda

python main.py --umap

python main.py --entropy

python main.py --biblio

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.json		config.json
general.log		general.log
logging.ini		logging.ini
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On Finding Megadiversity Among the Corpus of Scientific Literature

Introduction

Requirements

Set up work environment

About

Releases

Packages

Languages

sa-aguilarv/thesis_exp

Folders and files

Latest commit

History

Repository files navigation

On Finding Megadiversity Among the Corpus of Scientific Literature

Introduction

Requirements

Set up work environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages