Classification

We propose domain subject classification and alloy phase classification tasks.

Domain

The labelled dataset is generated by randomly sampling domain journals in CORE data.

python llm-classifier.py  --model globuslabs/ScholarBERT --emb-size 1024

The deepspeed parallelization for the fine-tuning codes are also provided for above 2 tasks, respectively. E.g., to run it on Summit

bsub phase/launch_classifier_phase.lsf