Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evariste Technologies compounds #29

Open
abrennan5 opened this issue Jan 11, 2021 · 53 comments
Open

Evariste Technologies compounds #29

abrennan5 opened this issue Jan 11, 2021 · 53 comments

Comments

@abrennan5
Copy link

abrennan5 commented Jan 11, 2021

Hi all,

We’ve already caught up with @mattodd about this but will just give a quick intro for everybody else. Evariste Technologies is a start-up focusing on a probabilistic approach to drug discovery. Briefly, the platform we’ve built, Frobenius, takes an existing dataset and identifies the most promising starting point/s, then designs a bunch of new compounds and scores them according to the likelihood of achieving a set of pre-specified endpoints.

We were really interested in the recent publication detailing the open competition run by the OSM team and thought we’d have a crack at the problem ourselves. The compounds attached are the output generated by Frobenius when it’s presented with the series 4 data. More specifically, we’ve taken the two most promising starting points, applied the various compound designers and selected a subset of the highest scoring compounds (filtered by a medicinal chemist for synthetic feasibility etc). The number associated with each compound is the probability of it achieving a pIC50 of 8.

As we mentioned to Mat, we’re keen to get some of these synthesised and are able to contribute towards cost of synthesis. We’re also more than happy for anyone interested in these compounds to use them as inspiration for similar structures, if this is the case, we’d really appreciate being kept in the loop as we can very readily score the idea in Frobenius.

If anyone’s interested in knowing more about the modelling than the (very) brief overview I’ve given here we’re happy to discuss in detail.

Best wishes,
Alfie

https://www.evaristetechnologies.com
https://www.linkedin.com/in/alfie-brennan-746ba6b1

Evariste Suggestions

Series4_EVT_Suggestions.xlsx

Malaria suggestions pIC50 8 .pdf

@mattodd
Copy link
Member

mattodd commented Jan 12, 2021

Hi @abrennan5 - this is very interesting, thanks for posting. It'd be really great to make some of these, yes.

Could someone please take a look and generate a pic of the molecules we can quickly post into an issue, just to help with digesting the ideas simply (I know there's a PDF, I'd just like a cdx. Alfie if you have one, you can drag it here, but you may have to zip it up for Github to accept it).

I'm intrigued by the meta-subst rings in the northeast, which is something we played with a bit in @maratsydney 's work, but have not explored a lot. The accessibility of those will depend on availability of materials, I suspect.

For EVT-004 there's a CF3 there. I thought we'd done that?

Also of interest is the EVT-007, EVT-008, EVT-009. Didn't we take that N away (in EVT-009) already? Replacement with C-F and C-Cl is something we've not done.

@abrennan5
Copy link
Author

Hi Mat,

Compressed cdx below. Each compound designer should drop suggestions that are already in the dataset but if any of these have cropped up before let me know and we can fix that.

Malaria suggestions pIC50 8 .zip

@edwintse
Copy link
Collaborator

I've updated the original post with the structures of the suggestions. Just some quick comments:

For EVT-004 there's a CF3 there. I thought we'd done that?
We have 1 compound with an OCF3 group as in EVT-004 but it has an amide on the LHS (MMV675963/OSM-S-271/TM 55-1).

Didn't we take that N away (in EVT-009) already?
Yes, we have 4 compounds with X = H instead of that N atom and a 4-CN on the RHS (from Patrick Thomson's work). Examples include both ether and amide linkers on the LHS but all are inactive.

I'm not sure if the synthetic accessibility of the benzylic OCHF2 group has changed but that was one of the things that we weren't able to remake from the inherited compounds. If we can't find a way to make that side-chain then all the compounds based on Starting Point 2 may not be accessible.

I'll have a look at whether the aldehydes needed for the different cores are available in the meantime and update later.

@abrennan5
Copy link
Author

Great to meet you Edwin, thanks for uploading the image. Re the compounds containing the benzylic OCF2H, the OCH3 analogue was also one of the top ranked hits, see below for a set of suggestions from that starting point. Perhaps unsurprisingly, the model scores modifications of the methoxy group relatively highly as it views changes here as likely to increase potency. There were a significant number of designs in this region, all with around the same 4 - 7% chance of having pIC50 > 8. I selected the smaller changes to avoid having too much of an impact on solubility and logD. I've also (hopefully) fixed the filtering such that there shouldn't be any compounds already present in the dataset included here but please let me know if that isn't the case.

Let me know your thoughts around this set of molecules as well as just replacing the OCF2H with OCH3 in EVT-011/021.

OMe_suggestions.zip

@abrennan5
Copy link
Author

Hi @mattodd and @edwintse,

I hope you're both well. Just wanted to follow up with our most recent work here, we're actually planning to publish the process of generating these ideas in a blog post at some point soon but I wanted to get them uploaded here first as it builds on so much of your hard work. You'll note the compounds suggested here are slightly different to those above. There are a couple of reasons for this; most importantly, instead of just optimising for improved potency, we asked the model to also aim for molecules with improved solubility and logD in the range of 0 - 4 (essentially 1 - 3 + prediction error). We have also used an updated version of our platform which predicts our error more accurately.

Finally, when selecting compounds we used a 'pessimistic design' algorithm which picks molecules for synthesis predicated on the failure of all the preceding molecules. This filters our list of 30 designs (based on three different starting points) down to 10. You'll note that in the PDF there are two predicted potency values, black is independent of the result for the earlier molecules, red is presuming that the preceding molecules failed to hit the desired endpoint (pIC50 > 8).

None of these feature the troublesome benzylic OCF2 due to the prioritisation of molecules which were predicted to be more soluble.

Happy to field any questions and share a wider selection of designs if that would be helpful.

Best wishes,
Alfie
compound_picker_suggestions.pdf
compound_picker_suggestions.zip

@edwintse
Copy link
Collaborator

Hi @abrennan5, I've finally had a look into potential synthesis of these compounds and it doesn't look like we'll be able to make any from your last post. I think the main issue is with the difluorophenyl ring - these substrates are typically more expensive to get a hold of compared to phenyl ring analogues, and this is especially the case when there are additional substituents on the ring. Just based on gut feeling, the compounds in your original post from Starting Point 1 are also less desirable as changes to the NE portion tend to decrease potency. I'm starting to think that the Starting Point 1 compound isn't the most ideal in terms of synthesising analogues. Any chance you'd be able to redesign some new compounds with this in mind?

@abrennan5
Copy link
Author

Hi @edwintse, thanks for taking the time to review the molecules and potential routes, we really appreciate your input here. We can absolutely go from a different starting point, I'll remove both fluorines from that ring and see where we end up - most of the suggestions will design something back in in those positions, but the overall complexity should be lower.

I'll also look at building a virtual library using the published route, this sort of forward synthesis is something we're introducing more and more often on our projects. We can also take into account cost when doing this (very much a beta version of this feature at the minute and it is highly dependent on where the building block comes from).

I'll get something back to you later this week.

@abrennan5
Copy link
Author

Hi @edwintse,

I ran the modelling again and selected some starting points which should (hopefully) be somewhat easier/cheaper to access. The two most promising analogues without any of the complex/expensive SMs discussed previously were:

  • indole in the NE position and the phenyl ring with the pendant ethyl alcohol in the NW
  • 4-chlorophenyl in the NE position and the simple benzylethyl ether in the NW.

I applied our design algorithms but I also built virtual libraries using in stock alcohols and aldehydes available from Enamine so at least some of the compounds attached should be easy to plug into the existing route. The building block price filter is still work in progress so I wasn't able to apply it here.

The modelling was the same as the previous post so the designs have been biased for improved solubility and then selected based on predicted potency. Changes to the NE portion still crop up and I suspect this is the case because the modelling views regions of steep SAR as relatively promising for finding highly potent molecules. The reason for this is that they might well be rubbish, but they could also be great if they pick up the right interactions close to the surface of the protein. There are a few anilines present which I would normally filter out, but in this case I chose to leave them in as they are also likely to be easy to make.

Happy to hear your thoughts on these, have a great weekend!
Alfie
new_starting_points.zip

@edwintse
Copy link
Collaborator

edwintse commented May 6, 2021

Hi @abrennan5, thanks for the new suggestions! We might have a go at making the higher scoring ones like the cyclic urea (0.07). Interesting that the 0.14 compound is predicted to be active. We've made the same compound but with the NH2CH2 at the 3-position and it ended up being inactive.

@abrennan5
Copy link
Author

@edwintse Glad to hear a few of these might make a target list! I initially couldn't find the molecule you mention so I've had a look and realised that there were a small number (about 25) that had a potency value in the Dundee column but not in the collated 'PfaI EC50 uMol (Mean)' column we'd been using.

I've updated the modelling and the indole predictions are broadly similar with slightly (2-fold) lower success estimates. The drop off is broadly consistent with the other suggestions, with the exception of the 0.14 compound, which is now 0.03 and probably better reflects your understanding of the SAR. No other starting points jump up the list, most of the missing data was for inactive molecules.

Hope that helps clear things up!

@edwintse
Copy link
Collaborator

Just to update on where we're at with these compounds:

  • When trying to make the respective cores via the typical route, the condensation works fine but no reaction occurs under the oxidative cyclisation conditions.
  • A quick and dirty method to couple the diol to the core (for Suzuki coupling) was more dirty than quick.
  • I've made the mono-THP protected diol via the long route. Coupling to the core works fine but bromination with NBS gave a mix of products where the THP group had fallen off.
  • The Suzuki on the monobrominated free alcohol looks promising but the scale was too small.
  • I'm currently scaling up the coupled core and will retry the bromination using reXST'd NBS.

Evariste progress

@abrennan5
Copy link
Author

That's a great effort, thanks @edwintse! The monobrominated free alcohol looks like a really useful intermediate if the chemistry works on a reasonable scale.

Interesting that nothing happens with the oxidative cyclisation. Do you see any sort of pattern with more/less reactive substrates? Seems like the sort of thing that might give a pretty straight line on a Hammett plot.

@edwintse
Copy link
Collaborator

Yes, it would be super useful (hopefully the reaction is a bit cleaner with the reXST'd NBS).

Both hydrazones were essentially insoluble in CH2Cl2 so that could be the main factor (though I would've expected at least a trace amount of the cyclised product with some heating which I didn't see).

@abrennan5
Copy link
Author

Ahh, that makes sense. Hope the scale up is going well!

@edwintse
Copy link
Collaborator

Another update on where we're at:

  • Revisiting the cyclisation reaction using Pb(OAc)4 or Chloramine-T still didn't give the cyclised core
  • Scale-up of the bromination (w/ reXST'd NBS) on the core coupled with the THP-protected alcohol gave the same result as the first time with a mix of the products shown in the previous scheme (interestingly in roughly the same ratios as well)
  • None of the Suzuki couplings on the free alcohol/brominated core with the benzimidazolone led to the desired product. Only debrominated SM was seen in all reactions (A)
  • Since the target compound (with the benzylic alcohol) was not being cooperative, we decided to swap the ether with the regular difluorophenyl group as this would be more straightforward. Unfortunately all Suzuki reactions on this core with the same boronic ester were also only giving debrominated SM (B)
  • Just to be sure that the reaction conditions weren't the problem, I did the Suzuki coupling with the benzimidazole and phenyl (not shown) boronic esters and both gave the desired product (C)
  • A SciFinder search on the benzimidazolone boronic ester for Suzuki coupling showed the majority of reactions used PdCl2(dppf) as the catalyst instead. Under these conditions, the desired product was finally obtained (D)

Untitled Wiley-3

In terms of going back and making the actual predicted target compounds, I'm less sure about. The benzimidazole compound (EGT 541-1) was actually already made by @maratsydney and it ended up being inactive (>25 uM). I don't see there being too significant a difference in potencies between the two ether side-chains.

With that in mind, are you able to put these 2 compounds (EGT 540-4 and EGT 541-1) through your model to see if they still get predicted as active?

@abrennan5
Copy link
Author

Hi Ed,

That's great work, thanks for grinding through the chemistry! As you suggest, the model doesn't expect great things of EGT 541-1, it predicts a pIC50 of 5 +- 0.6 at roughly a 70% Cl which is pretty much the bottom end of the available data.

It predicts slightly better things for EGT 540-4, 5.6 +- 0.5 which doesn't sound like a huge improvement but it's certainly a step up on the inactive prediction.

I'll re-run the modelling tomorrow using a virtual library constructed based on the chemistry above and update the top suggestions for that ether substituent.

@mattodd
Copy link
Member

mattodd commented Jul 19, 2021

Yes, at this stage we really want to find any compounds with the same black structure as 540-4 or 541-1 but with variations in orange - i.e. meta- and para-subst rings - where that subunit is on a commercially available boronic acid. i.e. what can we access using these Suzuki couplings.

@abrennan5
Copy link
Author

abrennan5 commented Jul 20, 2021

Hi both, we've built a virtual library based on a Suzuki coupling of the above bromide and all enamine in stock boronic acids then scored the compounds based on our models. I've attached the top 100 (sorted by probability of having pIC50 > 7 and logD 1 - 4). The pdf/cdx are the top 10.

You can sort for predicted potency (potency_mu), or only the probability of achieving pIC50 > 7 (potency_prob_success) if you were more interested in that than the logD.

Let me know if there's anything else we can do to help, happy to provide the whole library or score for solubility/clearance as well if that would be of interest

enamine_boronics_virtual_lib.csv
enamine_boronics_virtual_lib_10.pdf
enamine_boronics_virtual_lib_10.zip

@mattodd
Copy link
Member

mattodd commented Jul 20, 2021

That's really great. @edwintse what do you think? We absolutely are interested in potency, but also logD. We need, what, 3-5 compound suggestions with a sweet spot combination of the two. That they can (likely) be made from the same core is a real bonus here.

@edwintse
Copy link
Collaborator

The top 10 look good. Some of the boronic esters/acids are a little pricy but about half are pretty cheap. Will have a think which to go for

@edwintse
Copy link
Collaborator

@mattodd Ok, so I've had a look through the top 10 and summarised the details below. I also put the compounds through Ersilia's prediction app as a cross-check and added the probabilities of being active below each compound. The prices of the boronic acid/ester from Enamine and Fluorochem are listed as well. I've deprioritised the compounds in red as we're more interested in 3,4-substituted phenyl rings. The top middle compound in green seems like an obvious one to go for as it's predicted active by both models and the reagent is cheap. The pyrrolopyridine in the top left could also be one to go for? Thoughts on the others?

enamine_boronics_virtual_lib_10

@mattodd
Copy link
Member

mattodd commented Jul 21, 2021

Nice! OK, so if we number 1-10 from top left then I'd go for

1 - yes
2 - definitely

3 - maybe, expensive
4 - no, not 3,4 disubst
5 - yes
6 - no major advantage over 1
7 - no major advantage over 2
8 - Expensive but nice logD
9 - no, not 3,4 disubst
10 - expensive but quite nice

So 1, 2, 5 and maybe one more? Any thoughts on these from @abrennan5 @GemmaTuron @miquelduranfrigola @drc007 @jonjoncardoso ?

@abrennan5
Copy link
Author

Thanks for annotating @edwintse! Apologies for not adding compound codes, next update to the platform we're going to do this automatically.

Mat, I agree with your summary - 1, 2, and 5 look good. The one thing I would point out is that I think the calculated logD has struggled with either 2 or 7 as I would expect the CF2H to be 0.5 - 1 unit lower than the corresponding CF3. Benzylic C-F bonds are weird though so hard to say without measuring them both. I'd be tempted to include 7 on that basis but it might also be worth investigating a slightly more different analogue (ie 8) if you were only going to purchase one expensive precursor.

@abrennan5
Copy link
Author

Discussed with the team and we're happy to purchase the boronic acids for 1, 2, 3, 5, and 8 if that looks like a good list to you both. Let me know if a call would help sort out the details of where to get them delivered etc.

@jonjoncardoso
Copy link
Member

Interesting stuff! I can run those over our modSAR model to see how and if the predicted pIC50 matches those!

@mattodd
Copy link
Member

mattodd commented Jul 21, 2021

That's extraordinarily generous. I'll discuss with Ed (typical yields, hence how much we might need) and reply ASAP.

@jonjoncardoso
Copy link
Member

jonjoncardoso commented Jul 22, 2021

Here are the predictions made by our algorithm for this set of molecules: 2021_07_21_modSAR_predictions.zip

Predictions didn't seem to correlate much but our model also endorses the green compound (marked original_id=1600) on this visualisation:

image

I have posted the step-by-step on how to reproduce these predictions on this Jupyter notebook.

@mattodd
Copy link
Member

mattodd commented Jul 22, 2021

Most interesting, thanks @jonjoncardoso. Would you (or @edwintse ) be able to correlate the ID numbers here with the structures of the most interesting? i.e. 921 is the CH3 equivalent of our top scorer? And 1958 and 1501 are already in the set we're considering? What about those four in the quadrant above 5.7 Evariste and above 6.1 modSAR? Do they share anything?

While I think of it @edwintse let's be sure that none of these predicted compounds have already been made!

Re buying the boronics/boronates: Thank you again @abrennan5 for your very kind offer. I think 100 mg is likely to be enough of each, given how the Suzukis are going (above) and that it might take two attempts. @edwintse would you be able to put together a shopping list for at least 100 mg of the reagents for 1, 2, 3, 5 and 8 that maximises delivery speed while minimising any associated delivery costs?

@abrennan5
Copy link
Author

In terms of an optimised price list, I don't know if you've used MCule before @edwintse but they provide a service that does this (if it can find the building blocks) https://mcule.com/search/

Given how difficult this dataset is to predict on, I'd say that's a reasonable correlation between the two models. The range of ours is slightly lower (probably due to differences in data cleaning). 921 is the CF3 -> tBu analogue of 1600 and is lower down our list due to the higher logD prediction. 1958 is compound 10 in the above pdf (cyclopropyl ether) and 1501 is compound 7 (CF2H). Seems that both models like 4-Cl, 3-small lipophilic group!

@miquelduranfrigola
Copy link

miquelduranfrigola commented Jul 23, 2021

Nice! OK, so if we number 1-10 from top left then I'd go for

1 - yes 2 - definitely
3 - maybe, expensive
4 - no, not 3,4 disubst
5 - yes
6 - no major advantage over 1
7 - no major advantage over 2
8 - Expensive but nice logD
9 - no, not 3,4 disubst
10 - expensive but quite nice

So 1, 2, 5 and maybe one more? Any thoughts on these from @abrennan5 @GemmaTuron @miquelduranfrigola @drc007 @jonjoncardoso ?

Hi @mattodd @edwintse @abrennan5 @drc007 @jonjoncardoso

Really great results! We have looked into the 99 candidates that @abrennan5 shared and scored them according to 2 of our metrics (IC50Pred and DeepActivity). In this file you can find our selection (intersection of the best quartile of the two metrics = 17 candidates):

evariste_eosi_filtered.csv

In brief:

  • Of the top 10 candidates from Evariste's original list, 2 and 7 are the best supported with our metrics.
  • We provide 15 other compounds down the original list that may be interesting to explore.

About the metrics:

  • IC50Pred (IC): the lower the better. It is probably biased towards high values, so hopefully it is a conservative estimate.
  • DeepActivity (DA): the higher the better. It is a composite z-score between several deep learning scores (chemprop, grover; trained on classification and regression tasks)

Below a quick viz of the 17 candidates (Rank = order in the original list):
evariste_eosi_selection

@edwintse
Copy link
Collaborator

@abrennan5 I've had a look at the prices on MCule have listed them below. The prices below the short line are from Enamine or Fluorochem. I've had a look at a few of the more interesting ones from Ersilia's latest post as well. Based on the quote I got from MCule the shipping fees are a bit too expensive... I think the best bet would be to get everything from Enamine to make the most of the €60 shipping fee (everything is also in stock from them too). How's this look to you?
Untitled Wiley-12

@abrennan5
Copy link
Author

@edwintse Thanks very much for doing all the research, shame about the MCule shipping fees! Once we've got this first set tested we can have another look at the modelling and design/rescore analogues based on that. I'll have a quick check, but I reckon getting 4 from Enamine and the CF3 indole from fluorochem is probably the most cost effective method.

What address should we use when placing the order?

@abrennan5
Copy link
Author

Starting materials should be with you in the next couple of weeks! Enamine very kindly waived the delivery fee as it was for an anti-malarial project

@edwintse
Copy link
Collaborator

@abrennan5 Amazing!! Thanks so much for organising this!

@mattodd
Copy link
Member

mattodd commented Jul 29, 2021

Yeah, that's awesome @abrennan5, thank you, really exciting.

@edwintse
Copy link
Collaborator

@abrennan5 I've finally finished making the 5 compounds! I was having a bit of trouble with the 3-CF3,4-Cl coupling and ended up isolating the dechlorination product as well (EGT 552-1). We'll try and organise a time to get these compounds tested soon.

Evariste 2

@abrennan5
Copy link
Author

abrennan5 commented Sep 28, 2021

Thanks Edwin! That's absolutely great, love a bonus compound. Really appreciate all your work and looking forward to seeing the results.

@edwintse
Copy link
Collaborator

@abrennan5 @mattodd Potency results just back in. EGT 92-1 is a positive control that we also included in the assay. Three compounds (92, 553, 554) need repeating as their results were >3 fold difference in the initial repeats. Potencies are in uM

Untitled Wiley-8

@abrennan5
Copy link
Author

Hi @edwintse, thanks for the update and for organising the testing! Some interesting if not super exciting results. I think it's fair to say that the SAR really does not track in this series. I'll update the modelling and get back to you on Monday.

Thanks again!

@abrennan5
Copy link
Author

@mattodd @edwintse Hope you're both well, here are some of my thoughts on the above data:

  • We would have loved to find something highly potent, but we're fairly pleased overall with the accuracy of the predictions. Our mean values were slightly high but we had large standard deviations for everything as well. Our modelling has also moved on slightly since these compounds were made, for example our prediction for EGT-555-1 would have been 4.8 +- 1.5.
  • Very interesting to see the comparison of 552-1 and 552-3. The fact the SAR is so clearly not additive suggests it's worth exploring further combinations in these positions.
  • A key question is; did we select the "best" (most potent/most information rich) compounds from the limited library we designed from this core or are there still really good compounds hiding further down the list? Adding this set of data (I've taken the average of the two results for each molecule as a starting point) and re-running the modelling on the same library picks out some 3,5 substituted molecules, often with the CF3 in one of the two positions. The top compound is 3-CF3-5-CN (5.4 +- 1.6).
  • Do you want to carry on with a few more cycles? If so would that be a few more compounds based on the existing library or some changes to the ether portion that use similar chemistry.

Really appreciate all your help on this project, it's great to have some data we can discuss openly.

@mattodd
Copy link
Member

mattodd commented Nov 29, 2021

That's great @abrennan5. Interesting to get your thoughts here. I'd love to engage in another round and will check in with @edwintse. Are you able to post structures or SMILES for the new ones so that we can consider cost and synthetic difficulty?

@abrennan5
Copy link
Author

abrennan5 commented Nov 30, 2021

@mattodd @edwintse

The structures here are the 10 things we would make next and the csv file contains the full library scored using the updated models. The values below the structures are: predicted pIC50, the error bar around that prediction, and the conditioned probability of achieving pIC50 > 7.

By conditioned probability I mean that, having selected the 'best' molecule, we assume that it fails to hit the endpoints we want, and rebuild the model before selecting the second one. This helps remove very similar molecules that have clustered at the top of the list. The full library contains the raw probability of success scores so you can look further down the list if you want.

Our predicted sigma is much higher than we normally see on datasets with 400ish compounds. This isn't necessarily a bad thing as it hopefully reflects the relatively steep SAR you seem to encounter in this part of the pocket. It also takes into account that the SAR between the NE and NW groups is not at all additive.

If we were to look into other chemistry, it would be to take a few of the more potent groups we have already found or those that show up in the next round and try them with different ethers. Perhaps those that include the benzylic substitution found in some of the more potent compounds.

Let me know your thoughts!

top10_conditioned
boronic_library_rescore.csv.zip

Minor edit: Added compound numbers. Also, we didn't filter out the aryl chlorides in the list below as they are semi-compatible with this chemistry but obviously it might lead to some useful but potentially annoying side products.

@edwintse
Copy link
Collaborator

edwintse commented Dec 1, 2021

Hi @abrennan5, thanks for generating some new compounds! I've had a look for the commercial availability of the corresponding boronic esters and they are shown below.

Untitled Wiley-5

I was looking at the csv file and there's a compound in the list that isn't in the figure. The compound numbered 620 (2-OMe;3-OH) in the figure isn't in the list, and the compound numbered 250 in the list (3-Me; 5-NO2) isn't in the figure but is in the list?

The pyrrole might be interesting (closest we have already are imidazole and pyrazole, both were inactive).
The triazolopyridine might also be interesting
The 3,5-Cl;4-OCHF2 one might be good too

@abrennan5
Copy link
Author

Hi @edwintse,

Sorry that's my mistake! The list is supposed to be the full library but I've accidentally taken a subset instead. The full library is now attached to this message, although the compounds are numbered differently. See if you're interested in any of the others in there.

You can also sort the compounds by "potency_mu" which is our mean prediction. You'll see that the top scorers here are more similar to the last round - number 10 and number 7 in original picture (your post, July 21st) are still predicted to be about 1 uM but unlikely to be a substantial more active than that. It might be interesting to include a couple of the higher confidence/exploitation compounds and a few of the riskier bets.

Maybe a list that looks something like 7 + 10 from the first list, as well as 1, 2, 5, and 8 (all in stock and fairly cheap) from the second round of designs. Does that sound reasonable to you? If we're lucky, 7 and 10 will also provide the de-chlorinated analogues which are predicted to be modestly potent as well.

boronic_library_rescore.csv.zip

@edwintse
Copy link
Collaborator

Hi @abrennan5, here's the list of the 6 compounds and their prices and codes from Enamine. I'm making more of the brominated core so it'll be ready to Suzuki when these arrive. Thanks again!

Untitled Wiley-5

@edwintse
Copy link
Collaborator

@mattodd I've uploaded the data curves from Mark for the latest compounds we got back. MMV1903416 and MMV1903417 were run 4 times each. The last repeats for those were inconsistent so Mark said to take the values that are highlighted in red (2 each which would be averaged).

Ed Report.xlsx

@edwintse
Copy link
Collaborator

Hi @abrennan5, sorry for the delay. I've just finished purifying all the compounds and just need to NMR check them but hopefully they're all ok. I managed to get a number of bonus compounds which were the result of dechlorination or double Suzuki coupling.

Evariste Set 3

@mattodd In total there should be 12 compounds for testing (11 of these + 1 positive control)

@abrennan5
Copy link
Author

Thanks @edwintse, this is brilliant! Great job getting everything separated - if those double suzuki bonus compounds are any good we're going to have a job getting the logP down.

I'll run all the compounds through the model again, although the double suzukis will fall outside the domain of applicability I'd imagine. Looking forward to seeing the results!

@MFernflower
Copy link

@edwintse @abrennan5 Now i'm curious as to what a naked or mono-chlorinated biphenyl would do

CLogP goes straight into the skip but curious nonetheless

Para-biphenyl-boronic acid is not that costly:

image

@edwintse
Copy link
Collaborator

edwintse commented Mar 7, 2022

@mattodd @abrennan5 @miquelduranfrigola @GemmaTuron Results are in!! This round looks particularly promising. Quite a few of them are <1 uM.

  • The double Suzuki products are (I suppose) expectedly less potent
  • OSM-LO-69 is a compound that Jo made early on and it had an activity of 0.281 uM

Evariste Set 3

@abrennan5
Copy link
Author

Thanks @edwintse and @mattodd, this looks promising and there seems to be some interesting SAR to discuss. A lot of us are away on holiday at the minute but I'll follow up with an assessment of the predictions next week.

So far this project has been a real test of the modelling so it's great to see some fairly potent molecules. Thanks again!

@MFernflower
Copy link

@edwintse http://www.fluorochem.co.uk/Products/Product?Code=209754 (4‐cyclopropoxyphenyl)boronic acid is commercially available - might be worth trying to make something with it since it seems like para position is important for potency?

@abrennan5
Copy link
Author

abrennan5 commented Mar 18, 2022

@edwintse @mattodd Hi both,

Image attached shows our predicted 50% confidence intervals for the new, non-biphenyl molecules. A comparison of 11 and 12 is illustrative for why we think predicting model error is so important. 11 has the highest mean prediction but a low confidence interval (exploiting well understood chemical space) whereas 12 has a modest mean prediction but significant uncertainty (more explorative) and as it turns out it's the best compound in the whole set. Overall really pleased with the results and very grateful for your help so far.

We're more than happy to do another round but I appreciate you might well need some time to focus on the suggestions from @miquelduranfrigola and @GemmaTuron, which is absolutely fair enough given the amount of compounds that have already been made based on our modelling. Happy to discuss this whenever you think would be suitable.

Link below is to a blog that we've just put on our website writing up a summary of the work so far.

https://www.evaristetechnologies.com/post/automated-design-of-potent-antimalarials

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants