Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure report - terrible fits for some small molecules with two opposite formal charges #189

Open
simonlichtinger opened this issue Jun 19, 2022 · 6 comments
Labels
bug Something isn't working proton-transfer

Comments

@simonlichtinger
Copy link

Dear bespoke fits devs,

The setup

Installed as in previous issue reports, eg. #187

Bespoke executor was run with

BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor launch  --n-fragmenter-workers 2   --n-optimizer-workers  12      --n-qc-compute-workers 6    --qc-compute-n-cores   2 --directory bespoke-tmp

And submissions were made using:

BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor submit  --file   "$lig"  --workflow-file  "workflow_$(basename $lig ).json" 

Where the workflow file was generated in python:

from openff.bespokefit.workflows import BespokeWorkflowFactory
from openff.toolkit.topology import Molecule
from openff.qcsubmit.common_structures import QCSpec, PCMSettings
from openff.bespokefit.executor import BespokeExecutor, BespokeWorkerConfig
from openff.bespokefit.executor import wait_until_complete
from openff.bespokefit.schema.optimizers import ForceBalanceSchema

molecules = ["ligands/alafosfalin.sdf", "ligands/aminolevulinicacid.sdf", "ligands/amoxicillin.sdf", "ligands/valacyclovir.sdf"]

for lig in molecules:
    factory = BespokeWorkflowFactory()
    ala_ala = Molecule.from_file(lig)
    qcspec = QCSpec(method='HF',basis='6-31G*',program='psi4')
    factory.default_qc_specs = [qcspec]
    factory.optimizer = ForceBalanceSchema()
    factory.optimizer.max_iterations = 50
    factory.optimizer.step_convergence_threshold = 0.005
    factory.optimizer.objective_convergence_threshold = 0.005
    factory.optimizer.gradient_convergence_threshold = 0.005
    workflow = factory.optimization_schema_from_molecule(ala_ala)

    factory.to_file(f"workflow_{lig.split('/')[1]}.json")

I had made the optimisation more sensitive and allowed more iterations since some of my molecules were throwing errors that the optimisation didn't converge.

The failure case

Several ligands fail, the simplest one is aminolevulinic acid:
aminolevulinicacid.sdf.zip

image

The failure shows itself in terrible initial MM profiles, which do not/only slightly improve upon optimsation, for example:

image

I'm attaching the full output here, as an external link since github won't allow me to upload more than 25MB.
https://drive.google.com/file/d/12uNhrB2o7_eNVOSb06oIE7yyLxtdwsaO/view?usp=sharing

The presumed cause

This ligand has two formal charges which can lie in close proximity, so the absence of solvent during the qc generation causes these charges to be unstable. I observe distorted geometries in the scans, like this:

image

I think I may have initially observed this behaviour with ala-ala dipeptide in #186 , where my quantum calcuations with B3LYP crashed with an unkown error. While they ran with HF, it seems that this didn't in fact solve the problem but just made the quantum calculations not crash, the results were still problematic in a similar way to aminolevulinic acid, as my analysis now shows (initially, I didn't see this because of an inaccuracy in the openbabel sdf writer which - it appears - doesn't handle formal charges in these sort of molecules correctly, sorry for not realising this straight away in the first issue).

In any case, I would value your input on whether you think that this is indeed causing the problem and what might be done as a remedy to parametrise molecules like this. Have you seen this in your tests with molecules that also contain charged groups which can act as proton acceptors or donors? Or were they not included in the test sets so far? One thought would be to try an implicit solvent, so I would like to request this feature please. But might you have other ideas?

Many thanks
Simon

@j-wags
Copy link
Member

j-wags commented Jun 20, 2022

Ahh, this is really helpful - Thanks for digging into the details and reporting it.

Have you seen this in your tests with molecules that also contain charged groups which can act as proton acceptors or donors?

We've had trouble with proton transfers/connectivity rearrangements elsewhere in our projects as well. We often think "oh, this should be easy to fix, we'll just add a bond length constraint", but by the time we get to the bottom of a call stack that goes through a bunch of programs, it's actually really complicated. So in the long run, there may be an answer in TorsionDrive/GeomeTRIC, but that may take some time to thread in, and a lot of testing to decide when to enable something like hbond constraints.

what might be done as a remedy to parametrise molecules like this.

The best thing to do in the short term would probably be to raise an error or a warning, and not provide a custom parameter for the torsion. We could use something like the connectivity filter in qcsubmit, and say "if any optimizations have a connectivity transfer, don't generate bespoke parameters for this torsion". This should be just fine on a workflow level - If we don't output a bespoke parameter for that torsion, the molecule will just get the generic torsion parameter from the base force field. But we'll want to be careful in how we communicate with the user so that they understand what happened.

@j-wags j-wags added the bug Something isn't working label Jun 29, 2022
@Yoshanuikabundi
Copy link
Contributor

Hi @simonlichtinger - I'm having trouble reproducing this. I can see from your previous issues that you're on Ubuntu 18.04, but could you let us know the output of conda list and conda env export from the environment that caused this result?

@simonlichtinger
Copy link
Author

Actually, sorry, this particular data was produced on CentOS Stream 8, using a miniconda3 install.

I'm attaching the conda packages, as well as the workflow from which I ran.
conda_env_export_bespokefit.txt
conda_list_bespokefit.txt
workflow_aminolevulinicacid.sdf.txt
(I made the json into a txt as github won't allow json. Please change back)

I had used the HF method instead of B3LYP, because on a previous ligand, ala-ala dipeptide, that had avoided the same issue I'm now seeing.

@Yoshanuikabundi
Copy link
Contributor

Thanks! Will take another crack later today.

@j-wags
Copy link
Member

j-wags commented Jul 14, 2022

I worked with @Yoshanuikabundi yesterday to come up with a more minimal reproducing case. Here it is for posterity:

BEFLOW_OPTIMIZER_KEEP_FILES=True \
openff-bespoke executor run --smiles "[NH3+]CC[O-]" \
--workflow           "default"    \
 --output             "xxx.json" \
 --output-force-field "xxx.offxml"  \
 --n-qc-compute-workers 2  \
 --qc-compute-n-cores   1  \
--default-qc-spec xtb gfn2xtb none \
--directory bespoke-tmp

Then the proton-transferred confs can be found deep in the bespoke-tmp directory

@mattwthompson
Copy link
Member

#193 (in releases 0.1.3 and newer) might address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working proton-transfer
Projects
None yet
Development

No branches or pull requests

4 participants