Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 974 Bytes

README.md

File metadata and controls

23 lines (17 loc) · 974 Bytes

Classification

We propose domain subject classification and alloy phase classification tasks.

Domain

The labelled dataset is generated by randomly sampling domain journals in CORE data.

  • Fine tune and validate on domains text, e.g.
python llm-classifier.py  --model globuslabs/ScholarBERT --emb-size 1024
  • Test accuracy on CORE samples: classification accuracy

  • Clustering on embeddings: clustering

Phase

The labelled datatset is obtained from https://www.nature.com/articles/s41524-020-0308-7

  • Fine tune and cluster alloy phases based on material names: clustering

The deepspeed parallelization for the fine-tuning codes are also provided for above 2 tasks, respectively. E.g., to run it on Summit

bsub phase/launch_classifier_phase.lsf