bert-text-classificaiton-arxiv

AI or not AI? Classifying ArXiv articles with BERT

Installation

Prerequisites

Python ≥ 3.6

Provision a Virtual Environment

Create and activate a virtual environment (conda)

conda create --name py36_bert-arxiv python=3.6
source activate py36_bert-arxiv

If pip is configured in your conda environment, install dependencies from within the project root directory

pip install -r requirements.txt

Get ArXiv dataset

The dataset used in this repository should be downloaded from Kaggle

Create a folder data from within the project root directory. Place the downloaded file arxivData.json in the data folder.

Feature Extraction code

Now that the environment is setup and the dataset is available, you can run the code using the following command:

python feature_extraction.py

This will by default use the arxivData.json file as input and generate in the same data folder the X,y training and test files:

model training

Use the jupyter notebook run_model_keras to train the model. This is easier to visualise the results we get.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
EDA_arXiv.ipynb		EDA_arXiv.ipynb
LICENSE		LICENSE
README.md		README.md
feature_extraction.py		feature_extraction.py
feature_extraction_with_node2vec.py		feature_extraction_with_node2vec.py
run_model_keras.ipynb		run_model_keras.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bert-text-classificaiton-arxiv

Installation

Prerequisites

Provision a Virtual Environment

Get ArXiv dataset

Feature Extraction code

model training

About

Releases

Packages

Languages

License

pyvandenbussche/bert-text-classification-arxiv

Folders and files

Latest commit

History

Repository files navigation

bert-text-classificaiton-arxiv

Installation

Prerequisites

Provision a Virtual Environment

Get ArXiv dataset

Feature Extraction code

model training

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages