Polymer-degradability-ranking

Revealing Factors Influencing Polymer Degradation with Rank-based Machine Learning

Data

Training Data

The experimental data of polymer degradability can be found in two files:

'Data/exp1.xlsx'
'Data/exp2.xlsx'

And the literature data can be found in:

'Data/literature.xlsx'

Predicted Degradability of PolyInfo Data

The predicted degradability of PolyInfo data in model applicability domain can be found in this CSV file on GitHub. The detailed information about polymers can be tracked through PID in Polymer Database.

Model

The model for degradability ranking can be found in /Model.

Requirement

Python
Mol2vec
Gensim (pip install gensim==3.8.3)
Scikit-learn
RDKit

Usage

Getting the Code

Clone the repository using the following command:

git clone https://github.com/tsudalab/Polymer-degradability-ranking.git
cd Polymer-degradability-ranking

How to Run

Execute the script by running.

python degradability_ranking.py

Loading Datasets

Following Excel files containing molecular information will be loaded:

'Data/literature.xlsx'
'Data/exp1.xlsx'
'Data/exp2.xlsx'

Pairwise Calculation

First, the mol2vec calculation is applied to the molecules to obtain embemdding vectors. Then, pairwise calculation of datasets can be performed separately and then combined together. Example:

x_labeled_lit, y_labeled_lit = transform_pairwise(mol2vec(mols_lit), deg_lit)
x_labeled_exp1, y_labeled_exp1 = transform_pairwise(mol2vec(mols_exp1), deg_exp1)
x_labeled_exp2, y_labeled_exp2 = transform_pairwise(mol2vec(mols_exp2), deg_exp2)

x_labeled = np.concatenate([x_labeled_lit, x_labeled_exp1, x_labeled_exp2])
y_labeled = np.concatenate([y_labeled_lit, y_labeled_exp1 ,y_labeled_exp2])

Model Training and Creation of unified ranking

SVM is used to train the degradability model. The hyperparameter is optimized using grid search. When the model completes training, the script will automatically print out a unified ranking and degradation score.

Decision Tree Analysis

Decision tree analysis of the ranking result using molecular descriptors is provided at the end of the script.

Update the model

The train method allows you to train update ranking model using new degradability data files containing polymer smiles.

After place Excel files (.xlsx) in the 'Data' directory, each containing two required columns: SMILES and Degradability values.
Then Run the following command in your terminal to start the training process:

python main.py train newdata

A trained model will be saved as Model/update_model.pickle, and a notification of the training completion will be printed in the terminal.

Predict degradability of given polymer

The main.py allows to predict the degradability of given polymer SMILES. There are two main commands to achieve different tasks:
The "predict" command allows users to make predictions using the model. This command has additional options to specify the type of prediction and the model to use.

Usage for Comparing Given Polymers:

python main.py predict 'SMILES' 'SMILES'... -sp

'SMILES'... is a list of SMILES strings.
-sp: Specifies the default prediction for comparing the given SMILES.

Comparing a Specific Polymer's Rank Within Training Data:

python main.py predict 'SMILES' -c

'SMILES'... The specific SMILES string to be ranked.
-c: Compares the given SMILES with default data.

Additional Options:

'--model'... Specifies the model file. The default value is "deg_model.pickle".

Note: At least two SMILES strings are required when using using the -sp option。

Applicability Domain determination of Polyinfo data

The code for Applicability Domain determination can be found in (https://github.com/onecoinbuybus/KNN-Applicability-Domain/tree/main)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Data		Data
Model		Model
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
degradability_ranking.py		degradability_ranking.py
degradabilty_result_of_polyinfo.csv		degradabilty_result_of_polyinfo.csv
dt.png		dt.png
main.py		main.py
ranking_result.png		ranking_result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polymer-degradability-ranking

Revealing Factors Influencing Polymer Degradation with Rank-based Machine Learning

Data

Training Data

Predicted Degradability of PolyInfo Data

Model

Requirement

Usage

Getting the Code

How to Run

Loading Datasets

Pairwise Calculation

Model Training and Creation of unified ranking

Decision Tree Analysis

Update the model

Predict degradability of given polymer

Usage for Comparing Given Polymers:

Comparing a Specific Polymer's Rank Within Training Data:

Additional Options:

Applicability Domain determination of Polyinfo data

About

Releases 2

Packages

Languages

License

tsudalab/Polymer-degradability-ranking

Folders and files

Latest commit

History

Repository files navigation

Polymer-degradability-ranking

Revealing Factors Influencing Polymer Degradation with Rank-based Machine Learning

Data

Training Data

Predicted Degradability of PolyInfo Data

Model

Requirement

Usage

Getting the Code

How to Run

Loading Datasets

Pairwise Calculation

Model Training and Creation of unified ranking

Decision Tree Analysis

Update the model

Predict degradability of given polymer

Usage for Comparing Given Polymers:

Comparing a Specific Polymer's Rank Within Training Data:

Additional Options:

Applicability Domain determination of Polyinfo data

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages