-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for New Active Compounds #19
Comments
Here are is my submission. I have also attached a longer write up and supplementary data file with SMILES and additional generated structures. The first structure (11.mol2) technically passed the selection criteria that I have set but it just looks a bit weird so I added two of what I think are the next best structures. What do you think? |
Hi @IamDavyG - thanks, but those look like predictions for Series 1 (https://pubs.acs.org/doi/full/10.1021/acscentsci.6b00086), whereas we need suggestions based on Series 4 (the subject of this competition). Even though we'd like suggestions today (so we can get going in the lab) we're extending the deadline for another week, just to give people more time to come up with further suggestions (e.g. structures that are predicted to be nicely soluble). So hopefully you have time? |
Hi, The attached PDF contains our predictions for two molecules which ranked highly with our model. Two predictions are included as the best contains CHCHF2 which we think can eliminate to give HF, if this is problematic the second prediction could be used. We will provide a more structurally distinct prediction before next Friday. Best, |
Thanks Alex, Just to clarify, HF is a neurotoxin which may be undesirable in vitro, up to you guys which compound you prefer. Ben |
Thanks @BenedictIrwin @adw62 for the suggestions! Based on the synthetic difficulty (and the possibility of HF generation) I'll get started on the synthesis of your second prediction. |
I have since amended my methodology to generate molecules that feature the triazolopyrazine scaffold along with molecules that feature a distinct, but similar core scaffold in accordance with the objectives of this stage. DND4 features the highest ranked predicted activity out of the generated triazolopyrazine scaffold molecules. FESOL4 features a similar but distinct core scaffold to the representative triazolopyrazine scaffold with better solubility. FESOL7 should be considered in case FESOL4 is not distinct enough, as it features a sulfur substitution into the core scaffold. This molecule was selected using only the secondary model since it is likely outside the applicability domain of the submitted model. |
Thanks @IamDavyG! Interesting to see that DND4 is the highest ranked S4 compound since we typically see lower activities with benzylic ethers as the pyrazine substituent. Do you know how the same compound but with an additional methylene group in the ether linker fairs (CC(C)C(C=C1)=CC=C1C2=NN=C3C=NC=C(N23)OCCC4=CN=C(C)C=C4)? FESOL4 and FESOL7 are both really interesting too. I'll start having a look into the synthesis of those. |
Attached are our predictions for some structurally distinct molecules. There are two sets of predictions. The first are predictions made by a recurrent neural network, these score highly but can have many problems and look difficult to synthesise. The second set are curated from the output of the RNN by hand. We hope the second set are more reasonable to make but these score lower. We'll leave it to you to pick a final molecule depending how adventitious you are feeling :) |
We should have proposed our 2 molecules today, but unfortunately we had 2 crazy weeks so we did not have time to generate the molecules. We are very sorry about this. |
@gcincilla That should be fine. I probably won't get time to make them before Christmas but we can still include them in the paper as future work compounds. |
Thanks for the suggestions @BenedictIrwin @adw62. Some really interesting structures! Based on the compounds I think we would be leaning towards the second set to avoid the potential issues that you mentioned, however.....after having had a look into ways to synthesise one of the three compounds below, I can't see an obvious way to make them. Anyone have any ideas/suggestions? |
I can see if something like ICsynth could be used through StarDrop: https://www.optibrium.com/community/videos/presentations-webinars/451-syntheticpathways |
We used all the activity results from series 4 compounds to train our model. The model was build as previously described by Laksh. |
Here you find the 2 molecules we propose. We tried to open up new spaces (always inside the series-4) selecting molecules that, a part from active, are predicted as rather soluble (this was not easy) and permeable in Caco-2 cells. We cannot propose molecules very different from series-4 because the predictive model was developed to focus on this family. As mentioned earlier, soon we'll release our model through a publicly accessible web-based tool so that everybody can play with it. In this way synthetic and medicinal chemists will be able to explore their own ideas and check what our models say about it. We believe in the integration between human & artificial intelligence. |
Hi, Ben and I have been optimizing our generative RNN method some more since our last suggestions were so hard to synthesise. This compound has a nice solubility so hopefully it's of some use in the future even if there is no time to make it now. |
Hi @gcincilla, thanks for the suggestions. I just had a look at the routes to these but unfortunately some of the intermediates are harder to get a hold of than expected. Would you be able to post some of the backup molecules that you have? On second thought, I may be able to make M295. |
This might be unstable with the O=C-O-C... group |
@gcincilla We were just wondering if you replaced the 4-CN group in M295 with a 4-OCHF2 group, would that change how the compound scores in your prediction? |
@edwintse, the molecule you are proposing (i.e. M4625) should be as active as M295 but it seems less soluble according with our prediction. Nevertheless, if we want to assume that risk, we can go forward with it. |
We received the results for the 4 suggested compounds that were ultimately synthesised (#25) and surprisingly, 3/4 were found to be inactive. I have compiled the list of all the S4-based suggestions. Do any look good to you @alintheopen? Based on what we know about the S4 SAR, none look too promising to me. At this stage we were wondering whether @IamDavyG, @gcincilla, @BenedictIrwin, @adw62, @btatsis had any other suggested compounds that you could provide? |
On thing I noticed (I mentioned in my talk), when looking back at the training set, there was a compound that was expected to be active with some confidence. The only measurement that had been made on it was Dundee pfal IC50 ">10". I had the idea that perhaps the assay had a problem/blip, and the result was not representative? (I don't know how likely that is, would you expect much variation if test were repeated for example?). Unless you can see any particular reason this compound might polymerize or something that would disrupt the assay? I need to check which of the following two it was, as there was some kind of inconsistency between the master chemical sheet and my dataset. How easy/expensive is it to simply retest such a compound? Is it already available or would it have to be remade? If it's not feasible then no problem, just an idea. If it's very easy to retest, then I can suggest a few training set compounds that might be worth retesting (as not to miss opportunities) If that's not an option we could try to rebuild the Alchemite model (with these 5 extra data points) and see if anything else comes out. Alex isn't in the office anymore, so it would be a slower process, I'd have to find some spare time. In terms of already produced solutions we had a more experimental compound from the (second generation) RNN: The overall activities aren't super high for any of these 'confident' suggestions. If you are looking for something 'really active', we'd need to extrapolate a bit further, it might take some further analysis. I found high activity predictions with a larger error bar were more likely to push the boundary on activity, but there is the risk it would go the other way and be less active. The only other comment I can think of for now: Does the tert-butyl group substitution help open up the SAR? Can we simply pair this with the most active group on the left from the previous known compounds? If you want the model's input on this process, I could try to enumerate these combinations and run them though to find the best combination? |
@BenedictIrwin The compound with the potency of >10 uM corresponds to the right hand structure (the benzylic methoxy one). This was a bit of a complicated one (info about it here), but essentially the inherited compound was claimed to be enantioenriched but there was no data to support this. We remade the racemate and had the enantiomers resolved by chiral HPLC. This enantiomer was inactive while the other was active. It's likely we won't be able to retest the same compound. The tert-butoxy group does open up the SAR a bit since we wouldn't have immediately thought to make a compound with it so that might be a good approach for enumeration. |
@edwintse Thanks for checking that, it's good to know the signal may have been referring to the enantiomer. The descriptors we used for this particular model aren't sensitive to chirality. I'll look into the t-butyl enumeration for some ideas. |
@edwintse thank you for the update! I'm happy to know that we're closer to the paper completion. Congratulations! Of the 5 new molecules (4 suggested and 1 derivative) 3 were correctly predicted by our model (but unfortunately not our own molecule 😢 ). I have 1 question: at the beginning you asked to each winner to propose 2 compounds but, if I understand correctly, the second molecules that we suggested are not going to be synthesized. Is this right? I'm asking this because our proposed molecule M4625 was our favorite of the 2 we suggested. Do you think the second will result inactive? We deployed our model in a web-based application as mentioned here so that anybody can exploit the model according to his knowledge, skill and intuition, through a collaborative design approach. Unfortunately still almost nobody actively participated, so there are limited molecular suggestions. |
@gcincilla We originally asked for two suggestions, one based on S4 and the second with a structurally distinct core. I think you mentioned that generating a distinct compound wouldn't be that great using you model? Based on the structure, I would think that M4625 would have some activity. I don't think I have much of the pyrrole starting material that I used for my initial attempts so we'll see. I suppose it doesn't necessarily need to go into the paper straight away. |
OK @edwintse, thank you for the info. |
Hi @gcincilla @btatsis @wvanhoorn - can I quickly check something with you? The inactivity of the molecules synthesised above - does that change your models, or the other previous predictions you made? We were thinking about synthesising the Exscientia prediction that is the aniline derivative of the synthesised EGT454-1, and pursuing the pyrrole analogue predicted by Molomics (or a derivative like that). But before we do we wanted to be sure that the strength of the models' support for these structures is not fatally compromised by this admittedly small number of extra data points. |
Hi @mattodd , |
Hi everyone, I'm a new Honours student working with Alice @alintheopen at USYD. My project involves both Chemistry Education as well as organic synthesis. For the organic synthesis part of my project I'm interested in making some compounds for the Open Source Malaria project. I was directed to this thread about a molecular design competition. However, as I’m new here, I'm unsure about what has been done before or is currently a work in progress. I don't want to take anyone else's project and vice-versa. Are there Series 4 (or other) molecules here that I would be able to try and synthesise for my project? Or other molecules within the OSM project that would be helpful if I worked on them? I’m particularly interested at synthesising predicted targets to help to evaluate some of the models and communicating this project to non expert audiences. Sebastian |
Hi @Seb470, welcome aboard! So far the synthesis of these compounds has been a one-man job (me) so it'll be good to have you making some compounds too. I think the best thing to get started would be to sign up to the web-app created by the Molomics team (@gcincilla) which is here. This will let you have a play around with creating a few new molecules and seeing if it will be predicted as active or not. It's good because in the drawing tool, you are able to make small changes to the molecule and see how that affects the prediction score. I'm currently working on a couple of compounds and need to make an updated scheme. I should have that up on this issue tomorrow. But for now, have a play around, and maybe for some synthesis next week you could start making some of the OCHF2 core. |
Hi @edwintse, thank you very much for the welcome. I've started trying to make molecules in the web-app now, it's very interesting! I haven't been able to make anything on the upper end of the leaderboard yet but we'll see. re synthesis: Okay thank you for the starting point. I'll try to get some of that made ASAP. Likely next week, depending on how the other part of my project is going. |
With the competition (#1) concluded and the winners announced (#18), they have been tasked with coming up with suggestions for new active compounds by using their developed models. These suggestions will be synthesised and sent for testing (hopefully) before Christmas. We can then add this all to the paper (#3) before submission.
In order to give us time to get started:
The deadline for first-pass suggestions will be close of business Friday 15th Nov.
The final deadline will be close of business Friday 22nd Nov to allow for an additional suggestion.
The structures below have been suggested by the winners to be synthesised. Currently no routes have been found to access the additional suggested compounds (with the exception of the leftmost one)
The text was updated successfully, but these errors were encountered: