Notebooks and method as implemented in our manuscript #20

akalikadien · 2024-03-15T15:03:55Z

🚀 It is time to merge this beast of a branch back to main. 🚀

Basically this branch has been used for all of our development up until now. This is the core of our ML pipeline.

There are 4 approaches in which the ML pipeline was used, fully out-of-domain with respect to substrates, partially out-of-domain, in-domain or monte-carlo in-domain. Each of these approaches have their own functions and the notebooks give example of their usage. These functions were used in a loop to test all kinds of cases in our research, but these will be kept in a separate notebook.

This closes #12 #13 #14 #15 #16 #17 #18 and #19. These have either been solved in the code or will be added as separate notebooks in the SI of our manuscript. The other issues are unfortunately out of the scope of our project for now.

…n to requirements, updated predictions_on_unseen_substrate notebook to add dimensionality reduction, cleaned output of notebook

…oup/obelix-ml-pipeline into visualization_notebooks

…w saving of pyplot figures, added new ecfps for ligands from .sdf files, added dft_with_solvation representation for ligands, generalized df preparing for prediction_on_unseen_substrate and -partially_seen_substrate, corrected ohe ligand ligand representation filename, added raw data processing for dft_with_solvation model

…lass which become smaller if a defined performance is not reached, added option for dynamic target_threshold if target_threshold=None, removed redundant figure

…domly_chosen_fraction for objective 4. Modified .py files for each objective to also save train/test data in dataclass (not useful if function is used in loop), created notebook for 4th objective

…predictions_on_unseen_substrate.py

…e of feature importances returned variables for machine_learning and data class

into visualization_notebooks

…n_variables.py. Removed ligand and substrate representations for rdkit, dl_chylon, sigmangroup, sterimol. updated load_representations to drop NaN containing columns and print warning. Updated gitignore.

…L115, L151 and L170

… task such that predictions are returned, changed ligand loading such that dropna is only done if a NaN is detected in the df, tested y_scramble for predictions_on_unseen_substrate.py but WIP

… data for SM7-8. Added new test for objective one (test removal of ligands that are active/inactive for all substrates).

…oup/obelix-ml-pipeline into visualization_notebooks

…nstead of cclib's

…iptors are correct

… descriptors for ML based on this, updated clean sets for dft_nbd_model with new NBO values and free_ligand descriptors, added correlation matrices to representation_variables.py, added script that summarizes obelix code used to calculate dft_nbd_model_descriptors

…pdated ligand NBD descriptors based on fixes made in obelix

…escriptors for DFT NBD model

…culate difference between NBO/mulliken/lone_pair_occupancy on complex and free ligand, updated selection of descriptors for ML

…n with Sigman's publication and dont have swapped descriptors

…n new versions of datasets where descriptors were calculated using obelix

…ted functions to latest versions, cleaned example notebooks

…ml, reran notebooks with this environment

obelix_ml_pipeline/data_classes.py

…ld variable, removed old versions of ligand representations generated with older versions of obelix

akalikadien and others added 30 commits April 19, 2023 12:17

updated gitignore for figure folder, added intel extension for SKlear…

a9c9405

…n to requirements, updated predictions_on_unseen_substrate notebook to add dimensionality reduction, cleaned output of notebook

Added pca to partially_seen_substrate and within_substrate_class

3fa35b3

Added pca to partially_seen_substrate and within_substrate_class

4c7ae73

Merge branch 'visualization_notebooks' of https://github.com/EPiCs-gr…

fa0ad46

…oup/obelix-ml-pipeline into visualization_notebooks

Added ligand ohe data file

be10402

Added ohe for liganf closes #12, updated notebooks

91009ba

added new approach to randomly sample subsets within same substrate c…

259d317

…lass which become smaller if a defined performance is not reached, added option for dynamic target_threshold if target_threshold=None, removed redundant figure

modified dataclass to save train_data, test_data, random_seed and ran…

57d47be

…domly_chosen_fraction for objective 4. Modified .py files for each objective to also save train/test data in dataclass (not useful if function is used in loop), created notebook for 4th objective

Modified notebook 4, added example of model

eb8979a

added code for saving res_df to objective 2 notebook, fixed typos in …

e687e4c

…predictions_on_unseen_substrate.py

added standard deviation for feature importance plots, added datafram…

a139a80

…e of feature importances returned variables for machine_learning and data class

Modified experimental data file, signs added to DDG

6e08fdd

Merge branch 'main' of https://github.com/EPiCs-group/obelix-ml-pipeline

fdffb1e

into visualization_notebooks

added new selection of descriptors for dft_nbd_model to representatio…

48ea6c4

…n_variables.py. Removed ligand and substrate representations for rdkit, dl_chylon, sigmangroup, sterimol. updated load_representations to drop NaN containing columns and print warning. Updated gitignore.

fixed NBD position and added dihedral/pi-bond distances for L2, L74, …

6ad0416

…L115, L151 and L170

added experimental response with sm123 after 1h

0bed524

added predictions to train_data and test_data after ML, modified each…

cf59fc4

… task such that predictions are returned, changed ligand loading such that dropna is only done if a NaN is detected in the df, tested y_scramble for predictions_on_unseen_substrate.py but WIP

Updated ecfps for ligands

f8424a1

updated load_experimental_response to get 1H data for SM1,2,3 and 16H…

00973a3

… data for SM7-8. Added new test for objective one (test removal of ligands that are active/inactive for all substrates).

Merge branch 'visualization_notebooks' of https://github.com/EPiCs-gr…

3bd9e15

…oup/obelix-ml-pipeline into visualization_notebooks

updated NBO charges based on custom extraction function from obelix i…

77f4656

…nstead of cclib's

fixed indexing of the NBD in L31 such that dihedral and pi-bond descr…

8b77216

…iptors are correct

updated clean_tud_set to take free_ligand descriptors into account, u…

013c58f

…pdated ligand NBD descriptors based on fixes made in obelix

Update ligands_ecfp.csv

cbe9850

updated descriptor selection for DFT NBD model, sorted clean ligand d…

02d1bd4

…escriptors for DFT NBD model

calculated quadrant/octant at 7A radius, updated preprocessing to cal…

8fdbf24

…culate difference between NBO/mulliken/lone_pair_occupancy on complex and free ligand, updated selection of descriptors for ML

added processing and descriptors for structures that we have in commo…

09612de

…n with Sigman's publication and dont have swapped descriptors

incorporated latest critial bug fix EPiCs-group/obelix#16 of obelix i…

e2cfe03

…n new versions of datasets where descriptors were calculated using obelix

akalikadien added 3 commits March 11, 2024 16:12

updated readme

fbd1691

removed redundant ligand representations, notebooks and folders, upda…

a328dd9

…ted functions to latest versions, cleaned example notebooks

tested building conda environment from scratch, updated environment.y…

ba18751

…ml, reran notebooks with this environment

akalikadien requested a review from CValse March 15, 2024 15:03

CValse reviewed Mar 15, 2024

View reviewed changes

obelix_ml_pipeline/data_classes.py Show resolved Hide resolved

modified data_classes and obj 1-4 functions to include target_thresho…

0044b25

…ld variable, removed old versions of ligand representations generated with older versions of obelix

CValse approved these changes May 8, 2024

View reviewed changes

akalikadien merged commit e645ca1 into main May 8, 2024

akalikadien deleted the visualization_notebooks branch May 8, 2024 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebooks and method as implemented in our manuscript #20

Notebooks and method as implemented in our manuscript #20

akalikadien commented Mar 15, 2024

Notebooks and method as implemented in our manuscript #20

Notebooks and method as implemented in our manuscript #20

Conversation

akalikadien commented Mar 15, 2024