-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebooks and method as implemented in our manuscript #20
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…n to requirements, updated predictions_on_unseen_substrate notebook to add dimensionality reduction, cleaned output of notebook
…oup/obelix-ml-pipeline into visualization_notebooks
…w saving of pyplot figures, added new ecfps for ligands from .sdf files, added dft_with_solvation representation for ligands, generalized df preparing for prediction_on_unseen_substrate and -partially_seen_substrate, corrected ohe ligand ligand representation filename, added raw data processing for dft_with_solvation model
…lass which become smaller if a defined performance is not reached, added option for dynamic target_threshold if target_threshold=None, removed redundant figure
…domly_chosen_fraction for objective 4. Modified .py files for each objective to also save train/test data in dataclass (not useful if function is used in loop), created notebook for 4th objective
…predictions_on_unseen_substrate.py
…e of feature importances returned variables for machine_learning and data class
…n_variables.py. Removed ligand and substrate representations for rdkit, dl_chylon, sigmangroup, sterimol. updated load_representations to drop NaN containing columns and print warning. Updated gitignore.
…L115, L151 and L170
… task such that predictions are returned, changed ligand loading such that dropna is only done if a NaN is detected in the df, tested y_scramble for predictions_on_unseen_substrate.py but WIP
… data for SM7-8. Added new test for objective one (test removal of ligands that are active/inactive for all substrates).
…oup/obelix-ml-pipeline into visualization_notebooks
…nstead of cclib's
…iptors are correct
… descriptors for ML based on this, updated clean sets for dft_nbd_model with new NBO values and free_ligand descriptors, added correlation matrices to representation_variables.py, added script that summarizes obelix code used to calculate dft_nbd_model_descriptors
…pdated ligand NBD descriptors based on fixes made in obelix
…escriptors for DFT NBD model
…culate difference between NBO/mulliken/lone_pair_occupancy on complex and free ligand, updated selection of descriptors for ML
…n with Sigman's publication and dont have swapped descriptors
…n new versions of datasets where descriptors were calculated using obelix
…ted functions to latest versions, cleaned example notebooks
…ml, reran notebooks with this environment
CValse
reviewed
Mar 15, 2024
…ld variable, removed old versions of ligand representations generated with older versions of obelix
CValse
approved these changes
May 8, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 It is time to merge this beast of a branch back to main. 🚀
Basically this branch has been used for all of our development up until now. This is the core of our ML pipeline.
There are 4 approaches in which the ML pipeline was used, fully out-of-domain with respect to substrates, partially out-of-domain, in-domain or monte-carlo in-domain. Each of these approaches have their own functions and the notebooks give example of their usage. These functions were used in a loop to test all kinds of cases in our research, but these will be kept in a separate notebook.
This closes #12 #13 #14 #15 #16 #17 #18 and #19. These have either been solved in the code or will be added as separate notebooks in the SI of our manuscript. The other issues are unfortunately out of the scope of our project for now.