This notebook covers the procedure I used to dock the OSM series 4 compounds to PfATP4. The purpose of this was to ascertain whether the G358S mutation in PfATP4 migh affect the binding or efficacy of the series 4 compounds, in particular the new drug candidates generated by Ersilia's generative model. This is a work in progress!
Two datasets of compounds were provided, osm_allcompounds.csv which posessed the IDs, SMILES and experimental IC50 values for the compounds, and osm_newcandidates.csv which had the same except the IC50's were predicted. The largest molecule was used to set the size of the search space around the G358 loci. The exhaustiveness used was determined by looking at the reproducibility of results for the molecule with the largest number of rotatable bonds. All compounds were docked to both the WT and G358S using Autodock Vina. The PfATP4 structure was generated by Ersilia using Colabfold, and the G358S mutation was made using Charmm-gui.
To analyse the results, I first looked at the correlation between the docking score and the experimental IC50s. We found no correlation between them. There may be several reasons for this but it does not speak well to the ability of Vina to accurately calculate interaction strengths.
Section 2.2 details the effect that the presence of the serine has on the docked poses. Csv files of the analysis dataframes are available on the CorryLab github so that others may investigate the results themselves. Of interest is that the leading new candidate OSM-LO-72 maybe bind near G358S - provided the results are trusted which currently is a big if. Interprete with caution.
The future work I will do is to binarize the IC50, as IC50 cutoffs in the experimental data may be altering the correlation analysis. I will also take a more detailed look at the pose coordinates and interactions with PfATP4, in order to see if there are similarities to how cipargamin binds PfATP4 for which we have more confidence for.
The docking outputs are within the .tar files on the CorryLab github. And the analysis dataframes are stored within the .csv files. Just a note on terminology: the all_compounds just refers to those with experimental IC50s, it does not contain the new candidates
For the github thread of the project to which this notebook relates to, see here
~JT