Failure report - terrible fits for some small molecules with two opposite formal charges #189

simonlichtinger · 2022-06-19T14:27:51Z

Dear bespoke fits devs,

The setup

Installed as in previous issue reports, eg. #187

Bespoke executor was run with

BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor launch  --n-fragmenter-workers 2   --n-optimizer-workers  12      --n-qc-compute-workers 6    --qc-compute-n-cores   2 --directory bespoke-tmp

And submissions were made using:

BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor submit  --file   "$lig"  --workflow-file  "workflow_$(basename $lig ).json"

Where the workflow file was generated in python:

from openff.bespokefit.workflows import BespokeWorkflowFactory
from openff.toolkit.topology import Molecule
from openff.qcsubmit.common_structures import QCSpec, PCMSettings
from openff.bespokefit.executor import BespokeExecutor, BespokeWorkerConfig
from openff.bespokefit.executor import wait_until_complete
from openff.bespokefit.schema.optimizers import ForceBalanceSchema

molecules = ["ligands/alafosfalin.sdf", "ligands/aminolevulinicacid.sdf", "ligands/amoxicillin.sdf", "ligands/valacyclovir.sdf"]

for lig in molecules:
    factory = BespokeWorkflowFactory()
    ala_ala = Molecule.from_file(lig)
    qcspec = QCSpec(method='HF',basis='6-31G*',program='psi4')
    factory.default_qc_specs = [qcspec]
    factory.optimizer = ForceBalanceSchema()
    factory.optimizer.max_iterations = 50
    factory.optimizer.step_convergence_threshold = 0.005
    factory.optimizer.objective_convergence_threshold = 0.005
    factory.optimizer.gradient_convergence_threshold = 0.005
    workflow = factory.optimization_schema_from_molecule(ala_ala)

    factory.to_file(f"workflow_{lig.split('/')[1]}.json")

I had made the optimisation more sensitive and allowed more iterations since some of my molecules were throwing errors that the optimisation didn't converge.

The failure case

Several ligands fail, the simplest one is aminolevulinic acid:
aminolevulinicacid.sdf.zip

The failure shows itself in terrible initial MM profiles, which do not/only slightly improve upon optimsation, for example:

I'm attaching the full output here, as an external link since github won't allow me to upload more than 25MB.
https://drive.google.com/file/d/12uNhrB2o7_eNVOSb06oIE7yyLxtdwsaO/view?usp=sharing

The presumed cause

This ligand has two formal charges which can lie in close proximity, so the absence of solvent during the qc generation causes these charges to be unstable. I observe distorted geometries in the scans, like this:

I think I may have initially observed this behaviour with ala-ala dipeptide in #186 , where my quantum calcuations with B3LYP crashed with an unkown error. While they ran with HF, it seems that this didn't in fact solve the problem but just made the quantum calculations not crash, the results were still problematic in a similar way to aminolevulinic acid, as my analysis now shows (initially, I didn't see this because of an inaccuracy in the openbabel sdf writer which - it appears - doesn't handle formal charges in these sort of molecules correctly, sorry for not realising this straight away in the first issue).

In any case, I would value your input on whether you think that this is indeed causing the problem and what might be done as a remedy to parametrise molecules like this. Have you seen this in your tests with molecules that also contain charged groups which can act as proton acceptors or donors? Or were they not included in the test sets so far? One thought would be to try an implicit solvent, so I would like to request this feature please. But might you have other ideas?

Many thanks
Simon

The text was updated successfully, but these errors were encountered:

j-wags · 2022-06-20T17:12:25Z

Ahh, this is really helpful - Thanks for digging into the details and reporting it.

Have you seen this in your tests with molecules that also contain charged groups which can act as proton acceptors or donors?

We've had trouble with proton transfers/connectivity rearrangements elsewhere in our projects as well. We often think "oh, this should be easy to fix, we'll just add a bond length constraint", but by the time we get to the bottom of a call stack that goes through a bunch of programs, it's actually really complicated. So in the long run, there may be an answer in TorsionDrive/GeomeTRIC, but that may take some time to thread in, and a lot of testing to decide when to enable something like hbond constraints.

what might be done as a remedy to parametrise molecules like this.

The best thing to do in the short term would probably be to raise an error or a warning, and not provide a custom parameter for the torsion. We could use something like the connectivity filter in qcsubmit, and say "if any optimizations have a connectivity transfer, don't generate bespoke parameters for this torsion". This should be just fine on a workflow level - If we don't output a bespoke parameter for that torsion, the molecule will just get the generic torsion parameter from the base force field. But we'll want to be careful in how we communicate with the user so that they understand what happened.

Yoshanuikabundi · 2022-07-13T03:40:56Z

Hi @simonlichtinger - I'm having trouble reproducing this. I can see from your previous issues that you're on Ubuntu 18.04, but could you let us know the output of conda list and conda env export from the environment that caused this result?

simonlichtinger · 2022-07-13T09:31:26Z

Actually, sorry, this particular data was produced on CentOS Stream 8, using a miniconda3 install.

I'm attaching the conda packages, as well as the workflow from which I ran.
conda_env_export_bespokefit.txt
conda_list_bespokefit.txt
workflow_aminolevulinicacid.sdf.txt
(I made the json into a txt as github won't allow json. Please change back)

I had used the HF method instead of B3LYP, because on a previous ligand, ala-ala dipeptide, that had avoided the same issue I'm now seeing.

Yoshanuikabundi · 2022-07-13T23:00:06Z

Thanks! Will take another crack later today.

j-wags · 2022-07-14T16:25:53Z

I worked with @Yoshanuikabundi yesterday to come up with a more minimal reproducing case. Here it is for posterity:

BEFLOW_OPTIMIZER_KEEP_FILES=True \
openff-bespoke executor run --smiles "[NH3+]CC[O-]" \
--workflow           "default"    \
 --output             "xxx.json" \
 --output-force-field "xxx.offxml"  \
 --n-qc-compute-workers 2  \
 --qc-compute-n-cores   1  \
--default-qc-spec xtb gfn2xtb none \
--directory bespoke-tmp

Then the proton-transferred confs can be found deep in the bespoke-tmp directory

mattwthompson · 2023-01-05T17:13:43Z

#193 (in releases 0.1.3 and newer) might address this

j-wags added the bug Something isn't working label Jun 29, 2022

mattwthompson added the proton-transfer label Jun 29, 2022

Yoshanuikabundi mentioned this issue Jul 18, 2022

Deal with connectivity changes in QC generation #193

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure report - terrible fits for some small molecules with two opposite formal charges #189

Failure report - terrible fits for some small molecules with two opposite formal charges #189

simonlichtinger commented Jun 19, 2022

j-wags commented Jun 20, 2022

Yoshanuikabundi commented Jul 13, 2022

simonlichtinger commented Jul 13, 2022

Yoshanuikabundi commented Jul 13, 2022

j-wags commented Jul 14, 2022

mattwthompson commented Jan 5, 2023

Failure report - terrible fits for some small molecules with two opposite formal charges #189

Failure report - terrible fits for some small molecules with two opposite formal charges #189

Comments

simonlichtinger commented Jun 19, 2022

The setup

The failure case

The presumed cause

j-wags commented Jun 20, 2022

Yoshanuikabundi commented Jul 13, 2022

simonlichtinger commented Jul 13, 2022

Yoshanuikabundi commented Jul 13, 2022

j-wags commented Jul 14, 2022

mattwthompson commented Jan 5, 2023