-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Series 4 candidates based on generative model - EOSI #34
Comments
Hi @miquelduranfrigola, thanks for all these great suggestions!
|
Hi @edwintse thanks for the feedback. To answer your points
Suggested way forward We will:
Now the most important is that we define the constraints for the generative model. These 3 are clear:
Anything else? For example:
It will be very useful to know precisely your ideal profile of properties. Please let us know and we will try to implement it. |
Hi @miquelduranfrigola. Very interesting, and a great way to visualise the suggestions. I'd definitely agree with you re the top three filters. For logP we tend to want to focus on compounds <3.5, very roughly. To consider the shortlist, though, it would seem that, with those constraints applied, we (me, @edwintse, other potential contributors) need to simply browse through and isolate structures that, well, take our fancy. I mean, there are a lot of possibles here. Pinging @drc007 purely because I think you'll get a kick out of this. |
@miquelduranfrigola, This is really interesting. It would also be useful to have a measure of the "confidence" in the predicted activity model. Can you also identify which new molecules would add the greatest amount of new information to your predicted activity model? |
Hi @miquelduranfrigola, if you're able to incorporate all the filters @mattodd mentioned that'd be great. I suppose we'd be looking at getting the list down to 100 compounds, then we can quickly look through and pick a couple out to make. Just on the tree map visualisation tool, we were just curious about how the compounds were placed throughout the tree and if there was anything particularly significant about the different red clusters? I guess it's a bit hard to pick out certain compounds within the clusters or to know which are the "best". |
Thanks, @mattodd @drc007 and @edwintse for your comments! This will help moving forward. In the following days, @GemmaTuron and I will give it a push. We will try to:
In addition, I will provide a deeper explanation of the TMAP. I guess we will select the top 100 candidates based on "red regions", so hopefully this will address @edwintse's good point about how to navigate this map. As for @drc007's suggestion to identify what molecules would add more information to future models... very interesting, didn't think of this! I don't have an immediate answer, but we will try to address the point. Perhaps, to start with, we could see what molecules would expand more efficiently the applicability domain? |
Hi @miquelduranfrigola, we were just wondering whether it's possible to add a substructure search function to the dynamic visualisation tool? I guess we'd want to be focusing on structures that have meta or para substituents on the RHS phenyl ring as the more interesting ones to pursue. |
hello @edwintse, I was actually just having a closer look at the most desirable substituents according to the information on the wiki and series 4 paper. We are trying to refine the molecule generator these days, it would be great if you can give us some hints about the most desirable substituents, also taking into account what you have observed in terms of HLM and RLM. |
Hello @edwintse, sorry for the delay! I have updated the app visualization to provide some substructure search capabilities. As you mentioned you are interested in the RHS substituent I have added the following select-boxes: |
@GemmaTuron Wow, that's amazing and super useful to narrow things down! |
Hi @GemmaTuron, I've been trying to make some compounds suggested by Evariste recently (#29) and was wondering if you guys ever generated any structures containing structures similar to those in this comment with indole/benzimidazole type groups on the RHS (or even any of the other structures that they predicted)? It would be interesting to see if there was any overlap between your suggestions and those from Evariste. |
We are preparing a new batch of generated molecules. We will get back to you shortly. Good idea, Edwin, we will check overlap with molecules from Evariste. Thanks! Meanwhile, @GemmaTuron and I have prepared a small app where you can input your molecules of interest and will get some activity predictions according to a few simple ML models. Perhaps this is useful if you have some candidates from our lists or others or want to try small modifications on those molecules. Feedback most welcome!
Many thanks! |
The app is amazing! We've just had some new suggestions come through from Evariste (#29) so it's already been very useful for cross-checking between the predictions. |
As mentioned earlier, with @miquelduranfrigola we have done a second round of molecule generation. A detailed description of the process can be found in this repo: https://github.com/ersilia-os/osm-series4-candidates-2. In summary, we created a list of >400k candidate molecules that have undergone successive rounds of selection based on activity prediction, desirable physicochemical properties and synthetic accessibility scores. Finally, we have selected the best 90 compounds according to its predicted activity against P. Falciparum. The molecules can also be visualized in this app Exploration vs exploitationYou will probably see that these candidates are considerably different from your known series 4 dataset. This is because we have worked in “exploration” mode, i.e. we explore regions of the chemical space that are distant to the existing compounds. We hope that this collection nicely complements with the compounds discovered in issue #29 MetricsIC50Pred: the lower the better. It is probably biased towards high values, so hopefully it is a conservative estimate. |
Hi @edwintse as you can see in the comment above by @GemmaTuron we have done a second round of generative models. To (sort of) answer your question, here two quick-and-dirty PCA plots (done with Morgan fingerprints) comparing:
As you can see, we have a couple of compounds that cluster together with Evariste's compounds. |
OK @miquelduranfrigola @GemmaTuron this is most interesting. To make sure I understand: The "exploitation" compounds are compounds you're predicting to be active that are derived fairly directly from other actives. The "exploration" compounds are those where you're intentionally trying to stay within the clusters of actives, and away from the inactives, yet which are sampling different areas of chemical space. So, in the left hand plot above we see no red Exploration compounds in regions where there are green inactives. In the right hand plot we're seeing exploration compounds peppering the space of known actives but in a much more diverse cloud than the purple Exploitation compounds. Is the right hand plot meant to look like a zoom in to an area of the left hand plot? I couldn't quite map the two. I'm guessing the axis units are arbitrary, or relative? I was trying to use that as a guide. If this is all correct (?) then we're going to need to take a look at the Exploration structures more closely. That you've factored in synthetic accessibility is a major plus there. |
Hi @mattodd , The exploitation compounds plotted are the ones predicted by Evariste, we have used an "exploratory" generative model, and as you mention, we are trying to stay close to the actives but querying different areas of the chemical space. Your interpretation of right and left graphs is correct, this is a PCA representation so axis units are indeed arbitrary. The PCA was made once with the four datasets (left) and calculated again for the three datasets (right) so the right is not exactly a zoom of the left one. What is interesting is that some of our compounds (red) overlap with the chemical space of the Evariste compounds (purple), a good signal that these have potential strong activity. The rest of our predicted compounds (red) are interesting because they differ a bit from known actives and have been optimized not only by activity but alsosolubility, accessibility etc. Hope this clarifies a bit more ! |
@miquelduranfrigola @GemmaTuron We're a bit curious about the compounds that cluster with the Evariste ones. It seems like there's only a few red dots within the purple cluster. Were you able to give a zoom in on that region and show the exploration structures? I guess those would be the ones we'd prioritise if we were to make any. |
Hi @edwintse these are the two molecules that in the PCA plot cluster together with Evariste compounds: A few disclaimers and thoughts:
I hope this helps! |
Hi @miquelduranfrigola @GemmaTuron, just checking in to see how the compound generation is going? I've finished making and purifying the compounds from Evariste (#29) and will have them tested soonish, but we were hoping to start planning starting materials from your compounds that we might need to make or purchase. |
Hi @edwintse we have started working on it, we hope by the end of next week to be able to share some news! |
Hello @edwintse and @mattodd ! We have a final list of candidates (35 molecules) + an extended list of alternatives (1200 molecules). They all have high predicted potency, so perhaps now we can choose the ones with easier synthetic route and other interesting characteristics like solubility. All data and code is available in this repository. In short, we have:
We provide the 35 highest active predicted molecules from the list of 90 as putative candidates for synthesis, but we can also try to refine the search and enrich the list with candidate molecules from the also highly predicted actives list of 1295 molecules. Let us know your thoughts on these molecules and if there is any extra filter you would like to add before choosing the ones to be synthesized. |
OK, great @GemmaTuron. So, @edwintse (or Gemma) can you parse into a picture so that we can see roughly what starting materials we might be looking at in the most general sense? e.g. if there's a gram needed of the core Series 4 scaffold? |
Thanks for all the new compounds @GemmaTuron!
|
@drc007 Yes, it is racemic. Unfortunately I don't have enough of it to do any chiral HPLC testing. |
@mattodd I can make more. The alcohol is fine. It was just the SNAr that was a bit low yielding after purification. Datawarrior gives a clogp of 2.7 for this |
@edwintse would it be worth resolving the alcohol first? |
Possibly, although sometimes I don't completely purify the alcohols. I can see what I have left from when I made it. |
hi @edwintse and all! These are great news! Very excited about these results, thanks! |
@GemmaTuron Yes, let's. This coming Thursday pm would work at e.g. 3 UK time? Or 4pm UK time Friday? Happy to have it an open meeting so others can join/suggest if want? |
Hi @mattodd ! This week is complicated on our side, can we do NEXT Thursday (10th) at 15:00 UK time? |
No good - Friday 11th at 1, 3 or 4 UK? Otherwise I fear we may have a looming Doodle Poll 🤕 |
With regard to EGT614 would it be possible to make the analog with an extra
carbon between the benzocyclobutane and the core? Seems like truncating the
alkyl ether chain to anything other than ethyl can drop potency @edwintse
…On Tue, Nov 1, 2022 at 11:19 AM Mat Todd ***@***.***> wrote:
No good - Friday 11th at 1, 3 or 4 UK? Otherwise I fear we may have a
looming Doodle Poll 🤕
—
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYEWDW2Q2M4EBJWBAFQNSLWGEYHZANCNFSM453SAKIA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Let's go with Friday 11th at 13h UK time! What platform do you prefer? |
Hi @mattodd ! Just confirming the meeting on friday at 13h UK time? |
Yes @GemmaTuron thanks for the reminder, just sent invite, but please forward to others if you like - we can meet at https://ucl.zoom.us/j/4808072370 then. Talk soon! |
Hi all, Short update on next steps: Thanks everyone, we will post updates here as soon as we can. |
And another short update as we start the work described above! |
@GemmaTuron sounds great! We do have a few crystal structure files for a handful of compounds. I'll need to find them and share with you. As for protonation states, I guess a predictive software like MarvinSketch would do, otherwise I'm not entirely sure. |
@GemmaTuron if you want something a little more low tech, but high throughput try protonator |
Hi Everyone! Using autodock vina we docked the OSM4 compounds with known experimental IC50s and the new candidates to both wildtype and G358S isoforms of PfATP4. The search was constrained to the region surrounding G358. We unfortunately found no correlation between experimental IC50 and the docking score. There are a number of reasons why this might be the case, which we will continue to look into, but for the time being please interpret the following results with caution. A number of OSM4 compounds were found to bind in proximity to G358S loci (the box size is large enough to allow non-proximal binding). Of interest is OSM-LO-72, the new candidate with lowest predicted IC50, bound in proximity to G358S, though no change in affinity was predicted upon mutation (again, this is very preliminary and I have little confidence in the affinity prediction) Moving forward from here I will binarise the IC50 values and rethink the correlation analysis as discussed with @GemmaTuron and @miquelduranfrigola. To this end, is there affinity data available for any of the series 4 compounds, rather than whole-cell IC50s? I will also do a comparison of the protein interactions of the predicted poses and cipargamin, for which we have more confidence in. For more detailed description of the procedure and results, please see our github repository that contains the notebook and output files. And thank you all for the opportunity to work on this project. I'm excited to see where this goes! |
Thanks @John-D-Tanner !! From the Ersilia side, we have been working on developing a refined generative tool combining different techniques. This is almost ready and we will apply it using the latest experimental datapoints available as starting points. |
Hi @John-D-Tanner thanks for this, and sorry for the delay in getting back to you. Too many Github alerts! Also pinging @edwintse Interesting results, adding a little more to the mystery of how these compounds are acting. We don't have affinity data, no. To the best of our knowledge, nobody has ever made PfATP4, so it's hard to do these kinds of experiments. Thanks for posting raw data, but I think your link above is broken. Do you have a fresh one? I guess a key experiment would be to try OSM Series 4 compounds in the resistant cell line, right? |
My apologies, the repository was set to private but should now be public and the link should work |
The last compound from the most recent set (EGT 611-1) was tested for activity and came back as inactive. There were also 2 early compounds from Evariste (@abrennan5) that we never tested that were also included in this batch. Both are inactive as well. The positive control (369) is as expected. |
Hi @edwintse ! Thanks for the latest update, and sorry about the silence, we've been working on the background preparing a generative package quick and easy to implement, ChemSampler (still under development, but basic functionalities completed) |
We have done a first iteration based off the 4 compounds in the previous round with activities of < 1 uM.
With this constrains, we end up with the following 19 molecules: What do you think of these molecules Thanks! |
We've shipped the following 5 compounds to Adele at ANU to have them tested in their PfATP resistant line. Results will be posted when received. |
Hi @mattodd @edwintse But we are also uploading other related models, such as the ones developed with data contributed by MMV we'll be announcing models through our social media links during this month of September. Let me know if this is useful or you have any questions! |
Though this issue is getting rather long, I wanted to add the current set of compounds being evaluated in this collaboration between OSM and Ersilia. Using the latest version of the model, and the latest experimental data, we are experimentally pursuing the below structures. We're using a combination of CRO (Piramal) and in-house synthesis, and we should be done by early April, when we'll ship the compounds for eval. Very exciting! |
Cross-referencing to the results for the above structures, which are at OpenSourceMalaria/Series4#79 |
Hi all, As an update from the Piramal synthesis, we successfully obtained Targets 1,5 and 9 . For target 11 and 12 we have faced many challenges and we have stopped the attempts to synthesise. We are attaching here all the routes Piramal has tried to obtain these two targets: |
Hello @mattodd @edwintse,
At @ersilia-os we have tried to generate new Series 4 candidates. In short, we provide two tables:
For a first assessment of the results, you can check this dynamic visualization of the selected 1k candidates. If a cluster is of particular interest, please refer to the full results to discover other similar molecules. You can also check a tree map of all molecules.
Our generative model approach is based on Reinvent 2.0. We have implemented several reinforcement-learning agents, aimed at optimizing activity and other desirable properties. This GitHub Repository contains more detailed information and source code.
This is the first time we run a generative model, so please bear with us. We will be more than happy to optimize further runs based on your feedback.
Thanks!
@GemmaTuron @miquelduranfrigola
The text was updated successfully, but these errors were encountered: