Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebooks and method as implemented in our manuscript #20

Merged
merged 34 commits into from
May 8, 2024

Conversation

akalikadien
Copy link
Member

🚀 It is time to merge this beast of a branch back to main. 🚀

Basically this branch has been used for all of our development up until now. This is the core of our ML pipeline.

There are 4 approaches in which the ML pipeline was used, fully out-of-domain with respect to substrates, partially out-of-domain, in-domain or monte-carlo in-domain. Each of these approaches have their own functions and the notebooks give example of their usage. These functions were used in a loop to test all kinds of cases in our research, but these will be kept in a separate notebook.

This closes #12 #13 #14 #15 #16 #17 #18 and #19. These have either been solved in the code or will be added as separate notebooks in the SI of our manuscript. The other issues are unfortunately out of the scope of our project for now.

akalikadien and others added 30 commits April 19, 2023 12:17
…n to requirements, updated predictions_on_unseen_substrate notebook to add dimensionality reduction, cleaned output of notebook
…w saving of pyplot figures, added new ecfps for ligands from .sdf files, added dft_with_solvation representation for ligands, generalized df preparing for prediction_on_unseen_substrate and -partially_seen_substrate, corrected ohe ligand ligand representation filename, added raw data processing for dft_with_solvation model
…lass which become smaller if a defined performance is not reached, added option for dynamic target_threshold if target_threshold=None, removed redundant figure
…domly_chosen_fraction for objective 4. Modified .py files for each objective to also save train/test data in dataclass (not useful if function is used in loop), created notebook for 4th objective
…e of feature importances returned variables for machine_learning and data class
…n_variables.py. Removed ligand and substrate representations for rdkit, dl_chylon, sigmangroup, sterimol. updated load_representations to drop NaN containing columns and print warning. Updated gitignore.
… task such that predictions are returned, changed ligand loading such that dropna is only done if a NaN is detected in the df, tested y_scramble for predictions_on_unseen_substrate.py but WIP
… data for SM7-8. Added new test for objective one (test removal of ligands that are active/inactive for all substrates).
… descriptors for ML based on this, updated clean sets for dft_nbd_model with new NBO values and free_ligand descriptors, added correlation matrices to representation_variables.py, added script that summarizes obelix code used to calculate dft_nbd_model_descriptors
…pdated ligand NBD descriptors based on fixes made in obelix
…culate difference between NBO/mulliken/lone_pair_occupancy on complex and free ligand, updated selection of descriptors for ML
…n with Sigman's publication and dont have swapped descriptors
…n new versions of datasets where descriptors were calculated using obelix
@akalikadien akalikadien requested a review from CValse March 15, 2024 15:03
…ld variable, removed old versions of ligand representations generated with older versions of obelix
@akalikadien akalikadien merged commit e645ca1 into main May 8, 2024
@akalikadien akalikadien deleted the visualization_notebooks branch May 8, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OHE for ligands
2 participants