Skip to content

Malichot/SuperStyl

 
 

Repository files navigation

How to use

You will need python3.6, virtualenv and pip

Install

git clone https://github.com/Jean-Baptiste-Camps/willhelmus.git
cd willhelmus
virtualenv -p python3.6 env
source env/bin/activate
pip install -r requirements.txt
# And get the model for language prediction
mkdir jagen_will/preproc/models
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -P ./jagen_will/preproc/models/

Workflow

FIXME: look inside the scripts, or do

python main.py --help

for full documentation on the CLI.

Get feats

With or without preexisting feature list:

python main.py -t chars -n 3 -c debug_authors.csv [-p 1] -k 5000 -s path/to/docs/*
# with it
python main.py -f feature_list.json -t chars -n 3 -c debug_authors.csv -k 5000 -s meertens-song-collection-DH2019/train/*

Do the split

If you want to do initial random split,

python split.py feats_tests.csv -m langcert_revised.csv -e wilhelmus_train.csv

If you want to split according to existing json file,

python split.py feats_tests.csv -s split.json

Train svm

It's quite simple really,

python train_svm.py path-to-train-data.csv path-to-test-data.csv [--norms] [--dim_reduc None, 'pca', 'som'] [--kernel, 'LinearSVC', 'linear', 'polynomial', 'rbf', 'sigmoid'] [--final]
# e.g.
python train_svm.py data/feats_tests_train.csv data/feats_tests_valid.csv --norms --dim_reduc som

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 66.3%
  • Python 33.7%