LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions

This repository contains the implementation of the LLM-Prop model. LLM-Prop is an efficiently finetuned large language model (T5 encoder) on crystals text descriptions to predict their properties. Given a text sequence that describes the crystal structure, LLM-Prop encodes the underlying crystal representation from its text description and output its properties such as band gap and volume.

LLM-Prop architecture

Installation

You can install LLM-Prop by following these steps:

git clone https://github.com/vertaix/LLM-Prop.git
cd LLM-Prop
conda create -n <environment_name> requirement.txt
conda activate <environment_name>

Usage

Training LLM-Prop from scratch

Add the following scripts to llmprop_train.sh

#!/usr/bin/env bash

TRAIN_PATH="data/samples/textedge_prop_mp22_train.csv"
VALID_PATH="data/samples/textedge_prop_mp22_valid.csv"
TEST_PATH="data/samples/textedge_prop_mp22_test.csv"
EPOCHS=5 # the default epochs is 200 to get the best performance
TASK_NAME="regression" # the task name can also be set to "classification"
PROPERTY="band_gap" # the property can also be set to "volume" or "is_gap_direct". Note that if the task name is set to classification, only "is_gap_direct" is allowed here. And if the task name is set to regression, only "band_gap" or "volume" is allowed here.

python llmprop_train.py \
--train_data_path $TRAIN_PATH \
--valid_data_path $VALID_PATH \
--test_data_path $TEST_PATH \
--epochs $EPOCHS \
--task_name $TASK_NAME \
--property $PROPERTY

Then run bash scripts/llmprop_train.sh

Evaluating the pretrained LLM-Prop

Add the following scripts to llmprop_evaluate.sh

#!/usr/bin/env bash

TRAIN_PATH="data/samples/textedge_prop_mp22_train.csv"
TEST_PATH="data/samples/textedge_prop_mp22_test.csv"
TASK_NAME="regression" # the task name can also be set to "classification"
PROPERTY="band_gap" # the property can also be set to "volume" or "is_gap_direct". Note that if the task name is set to classification, only "is_gap_direct" is allowed here. And if the task name is set to regression, only "band_gap" or "volume" is allowed here.
CKPT_PATH="checkpoints/samples/$TASK_NAME/best_checkpoint_for_$PROPERTY.tar.gz" # path to the best model if the property to be predicted

python llmprop_evaluate.py \
--train_data_path $TRAIN_PATH \
--test_data_path $TEST_PATH \
--task_name $TASK_NAME \
--property $PROPERTY \
--checkpoint $CKPT_PATH

Then run bash scripts/llmprop_evaluate.sh

Data availability

This work is still under review and the data will be released after the review process.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
figures		figures
google		google
plots		plots
scripts		scripts
statistics/samples/regression		statistics/samples/regression
stopwords/en		stopwords/en
tokenizers/t5_tokenizer_trained_on_modified_part_of_C4_and_textedge		tokenizers/t5_tokenizer_trained_on_modified_part_of_C4_and_textedge
README.md		README.md
cif_to_text.ipynb		cif_to_text.ipynb
cif_to_text_copy.ipynb		cif_to_text_copy.ipynb
cif_to_text_copy2.ipynb		cif_to_text_copy2.ipynb
cif_to_text_copy3.ipynb		cif_to_text_copy3.ipynb
encoder_model.py		encoder_model.py
llm_args_parse.py		llm_args_parse.py
llmprop_args_parser.py		llmprop_args_parser.py
llmprop_dataset.py		llmprop_dataset.py
llmprop_evaluate.py		llmprop_evaluate.py
llmprop_model.py		llmprop_model.py
llmprop_train.py		llmprop_train.py
llmprop_train_encode-decode.py		llmprop_train_encode-decode.py
llmprop_utils.py		llmprop_utils.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions

Installation

Usage

Training LLM-Prop from scratch

Evaluating the pretrained LLM-Prop

Data availability

Citation

About

Releases

Packages

Languages

dhw059/LLM-predictor

Folders and files

Latest commit

History

Repository files navigation

LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions

Installation

Usage

Training LLM-Prop from scratch

Evaluating the pretrained LLM-Prop

Data availability

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages