The experimental data of polymer degradability can be found in two files:
- 'Data/exp1.xlsx'
- 'Data/exp2.xlsx'
And the literature data can be found in:
- 'Data/literature.xlsx'
The predicted degradability of PolyInfo data in model applicability domain can be found in this CSV file on GitHub. The detailed information about polymers can be tracked through PID in Polymer Database.
The model for degradability ranking can be found in /Model.
Clone the repository using the following command:
git clone https://github.com/tsudalab/Polymer-degradability-ranking.git
cd Polymer-degradability-ranking
Execute the script by running.
python degradability_ranking.py
Following Excel files containing molecular information will be loaded:
'Data/literature.xlsx'
'Data/exp1.xlsx'
'Data/exp2.xlsx'
First, the mol2vec
calculation is applied to the molecules to obtain embemdding vectors. Then, pairwise calculation of datasets can be performed separately and then combined together. Example:
x_labeled_lit, y_labeled_lit = transform_pairwise(mol2vec(mols_lit), deg_lit)
x_labeled_exp1, y_labeled_exp1 = transform_pairwise(mol2vec(mols_exp1), deg_exp1)
x_labeled_exp2, y_labeled_exp2 = transform_pairwise(mol2vec(mols_exp2), deg_exp2)
x_labeled = np.concatenate([x_labeled_lit, x_labeled_exp1, x_labeled_exp2])
y_labeled = np.concatenate([y_labeled_lit, y_labeled_exp1 ,y_labeled_exp2])
SVM is used to train the degradability model. The hyperparameter is optimized using grid search. When the model completes training, the script will automatically print out a unified ranking and degradation score.
Decision tree analysis of the ranking result using molecular descriptors is provided at the end of the script.
The train
method allows you to train update ranking model using new degradability data files containing polymer smiles.
After place Excel files (.xlsx) in the 'Data' directory, each containing two required columns: SMILES and Degradability values.
Then Run the following command in your terminal to start the training process:
python main.py train newdata
A trained model will be saved as Model/update_model.pickle, and a notification of the training completion will be printed in the terminal.
The main.py
allows to predict the degradability of given polymer SMILES. There are two main commands to achieve different tasks:
The "predict" command allows users to make predictions using the model. This command has additional options to specify the type of prediction and the model to use.
python main.py predict 'SMILES' 'SMILES'... -sp
- 'SMILES'... is a list of SMILES strings.
- -sp: Specifies the default prediction for comparing the given SMILES.
python main.py predict 'SMILES' -c
- 'SMILES'... The specific SMILES string to be ranked.
- -c: Compares the given SMILES with default data.
- '--model'... Specifies the model file. The default value is "deg_model.pickle".
Note: At least two SMILES strings are required when using using the -sp option。
The code for Applicability Domain determination can be found in (https://github.com/onecoinbuybus/KNN-Applicability-Domain/tree/main)