Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow mdpow-solvationenergy to select alchemlyb estimators #148

Merged
merged 21 commits into from
Jun 9, 2021

Conversation

VOD555
Copy link
Contributor

@VOD555 VOD555 commented Jun 2, 2021

Fix #135
-allow solvationenergy to use alchemlyb estimators

@VOD555 VOD555 requested a review from orbeckst June 2, 2021 14:27
@VOD555 VOD555 self-assigned this Jun 2, 2021
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make mdpow-solvationenergy a model for the scripts.

  • Replace optparse with argparse so that we can use better option processing.
  • The logic for force/no data is convoluted and should be simplified to avoid code duplication.
  • Please add more logger INFO output.

mdpow/fep.py Outdated
if G.method != "TI":
errmsg = "Method %s is not implemented in MDPOW, use estimator='alchemlyb'" % G.method
logger.error(errmsg)
raise ValueError(errmsg)
G.start = kwargs.pop('start', 0)
G.stop = kwargs.pop('stop', None)
G.SI = kwargs.pop('stop', False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks wrong — the kwarg should not be "start" but .. SI?

scripts/mdpow-solvationenergy Show resolved Hide resolved
scripts/mdpow-solvationenergy Show resolved Hide resolved
if not estimator in ('mdpow', 'alchemlyb'):
errmsg = "estimator = %r is not supported, must be 'mdpow' or 'alchemlyb'" % estimator
logger.error(errmsg)
raise ValueError(errmsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add INFO logger that states which estimator is used

scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
Comment on lines 249 to 253
parser.add_option('--noSI', dest='SI',
action="store_false",
help="Disable statistical inefficiency analysis"
"Statitical inefficiency analysis is performed by default when using"
"alchemlyb estimators and is disabled when using mdpow estimator.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a BooleanOptionalAction action option with argparse so that we have --SI and --no-SI (helps with #147 )

scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
force=False, stride=10, permissive=False, solvent='water')
import argparse

parser = argparse.ArgumentParser(usage=__doc__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you need to do something special so that argparse shows defaults, something like ArgumentParserWithDefaults or so — check the docs. Then remove the [%default] everywhere (IIRC...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be setting argparse.ArgumentDefaultsHelpFormatter and then the default values would be shown in the help message automatically.

Comment on lines 234 to 238
parser.add_argument('--noSI', dest='SI',
action="store_false",
help="Disable statistical inefficiency analysis"
"Statitical inefficiency analysis is performed by default when using"
"alchemlyb estimators and is disabled when using mdpow estimator.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the bool action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check the python documentation, BooleanOptionalAction action option is added in 3.9, we can't use it in 2.7.

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 3, 2021

Meet an unexpected error about the usage messsage
ValueError: unsupported format character 'p' (0x70) at index 1 and found the solution here.
https://stackoverflow.com/questions/24180527/argparse-required-arguments-listed-under-optional-arguments

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 4, 2021

Local tests output:

 mdpow-solvationenergy SM25 ../SM26/SM26/ -S octanol --start 1 --stop 20 --force
mdpow       : INFO     MDPOW 0.7.0-dev starting.
mdpow       : INFO     Copyright (c) 2010-2016 Ian Kenney, Bogdan Iorga, and Oliver Beckstein
mdpow       : INFO     Released under the GNU Public Licence, version 3.
mdpow       : INFO     For bug reports and help: https://github.com/Becksteinlab/MDPOW/issues
mdpow.config: INFO     Using the bundled force fields from GMXLIB='/nfs/homes4/sfan/miniconda3/envs/py27/lib/python2.7/site-packages/mdpow/top'.
mdpow.config: INFO     If required, override this behaviour by setting the environment variable GMXLIB yourself.
mdpow       : INFO     Analyzing directory 'SM25'... (can take a while)
mdpow       : INFO     [SM25] Reading octanol data 'SM25/FEP/octanol/Goct.fep'
mdpow.fep   : INFO     Solvation free energy calculation for molecule UNK in solvent octanol.
mdpow.fep   : INFO     Base directory is '/nfs/homes4/sfan/Projects/SAMPL7/gaff/SM25/SM25'
mdpow.fep   : INFO     Using setup directories under 'FEP/octanol': {'vdw': 'FEP/octanol/VDW', 'coulomb': 'FEP/octanol/Coulomb'}
mdpow.fep   : INFO     Default checkpoint file is '/nfs/homes4/sfan/Projects/SAMPL7/gaff/SM25/SM25/FEP/octanol/Goct.fep'
mdpow       : INFO     The solvent is octanol .
mdpow       : INFO     Analysis stride is 1 .
mdpow       : INFO     [SM25] Forcing (re)analysis of hydration free energy data
mdpow       : INFO     Estimator is alchemlyb .
mdpow       : INFO     Free energy calculation method is MBAR .
mdpow       : INFO     Analysis starts from frame 0 .
mdpow       : INFO     Analysis stops at frame None .
mdpow       : INFO     Statistial inefficiency analysis will be performed.
mdpow.fep   : INFO     [FEP/octanol] Finding dgdl xvg files, reading with stride=1 permissive=False.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 50.0223.
mdpow.fep   : INFO     The data are subsampled every 51 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 63.1531.
mdpow.fep   : INFO     The data are subsampled every 64 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 69.1584.
mdpow.fep   : INFO     The data are subsampled every 70 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 95.2881.
mdpow.fep   : INFO     The data are subsampled every 96 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 154.2010.
mdpow.fep   : INFO     The data are subsampled every 155 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 71.3375.

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 4, 2021

output from mdpow-solvationenergy -h

mdpow       : INFO     MDPOW 0.7.0-dev starting.
mdpow       : INFO     Copyright (c) 2010-2016 Ian Kenney, Bogdan Iorga, and Oliver Beckstein
mdpow       : INFO     Released under the GNU Public Licence, version 3.
mdpow       : INFO     For bug reports and help: https://github.com/Becksteinlab/MDPOW/issues
mdpow.config: INFO     Using the bundled force fields from GMXLIB='/nfs/homes4/sfan/miniconda3/envs/py27/lib/python2.7/site-packages/mdpow/top'.
mdpow.config: INFO     If required, override this behaviour by setting the environment variable GMXLIB yourself.
usage: mdpow-solvationenergy [options] DIRECTORY [DIRECTORY ...]

Run the free energy analysis for a solvent  in <DIRECTORY>/FEP
and return DeltaG.

DIRECTORY should contain all the files resulting from running
``mdpow.fep.Ghyd.setup()`` (or the corresponding ``Goct.setup()`` or
``Gcyclohexane.setup()`` and the results of the MD FEP
simulations. It relies on the canonical naming scheme (basically: just
use the defaults as in the tutorial).

The dV/dlambda plots can be produced automatically (--plot=auto). If multiple
DIRECTORY arguments are provided then one has to choose the auto option (or
None).

The total solvation free energy is calculated as

  DeltaG* = -(DeltaG_coul + DeltaG_vdw)

Note that the standard state refers to the "Ben-Naim" standard state
of transferring 1 M of compound in the gas phase to 1 M in the aqueous
phase.

Results are *appended* to a data file with **Output file format**::

          .                 ---------- kJ/mol ---
          molecule solvent  DeltaG*  coulomb  vdw

All observables are quoted with an error estimate, which is derived
from the correlation time and error propagation through Simpson's rule
(see :meth:`mdpow.fep.Gsolv`). It ultimately comes from the error on
<dV/dlambda>, which is estimated as the error to determine the
average.

molecule
    molecule name as used in the itp
DeltaG*
    solvation free energy vacuum --> solvent, in kJ/mol
coulomb
    discharging contribution to the DeltaG*
vdw
    decoupling contribution to the DeltaG*
directory
    folder in which the simulations were stored

positional arguments:
  directory             directory or directories which contain all the files resulting from running
                        mdpow.fep.Ghyd.setup()

optional arguments:
  -h, --help            show this help message and exit
  --plotfile FILE       plot dV/dlambda to FILE; use png or pdf suffix to determine the file type. 'auto' generates a pdf
                        file DIRECTORY/dVdl_<molname>_<solvent>.pdf and 'None' disables it The plot function is only
                        available for mdpow estimator,and is disabled when using alchemlyb estimators. (default: none)
  --solvent NAME, -S NAME
                        solvent NAME for compound, 'water', 'octanol', or 'cyclohexane' (default: water)
  -o FILE, --outfile FILE
                        append one-line results summary to FILE (default: gsolv.txt)
  -e FILE, --energies FILE
                        append solvation free energies to FILE (default: energies.txt)
  --estimator {mdpow,alchemlyb}
                        choose the estimator to be used, alchemlyb or mdpow estimators (default: alchemlyb)
  --method {TI,MBAR,BAR}
                        choose the method to calculate free energy (default: MBAR)
  --force               force rereading all data (default: False)
  --noSI                Disable statistical inefficiency analysisStatitical inefficiency analysis is performed by default
                        when usingalchemlyb estimators and is disabled when using mdpow estimator. (default: True)
  -s N, --stride N      Use every N-th datapoint from the original dV/dlambda data. (default: 1)
  --start START         Start point for the data used from the original dV/dlambda data. (default: None)
  --stop STOP           Stop point for the data used from the original dV/dlambda data. (default: None)
  --ignore-corrupted    skip lines in the md.xvg files that cannot be parsed. WARNING: Other lines in the file might have
                        been corrupted in such a way that they appear correct but are in fact wrong. WRONG RESULTS CAN
                        OCCUR! USE AT YOUR OWN RISK (default: False)

@iorga
Copy link
Collaborator

iorga commented Jun 4, 2021

Local tests output:

 mdpow-solvationenergy SM25 ../SM26/SM26/ -S octanol --start 1 --stop 20 --force
mdpow       : INFO     MDPOW 0.7.0-dev starting.
mdpow       : INFO     Copyright (c) 2010-2016 Ian Kenney, Bogdan Iorga, and Oliver Beckstein
mdpow       : INFO     Released under the GNU Public Licence, version 3.
mdpow       : INFO     For bug reports and help: https://github.com/Becksteinlab/MDPOW/issues
mdpow.config: INFO     Using the bundled force fields from GMXLIB='/nfs/homes4/sfan/miniconda3/envs/py27/lib/python2.7/site-packages/mdpow/top'.
mdpow.config: INFO     If required, override this behaviour by setting the environment variable GMXLIB yourself.
mdpow       : INFO     Analyzing directory 'SM25'... (can take a while)
mdpow       : INFO     [SM25] Reading octanol data 'SM25/FEP/octanol/Goct.fep'
mdpow.fep   : INFO     Solvation free energy calculation for molecule UNK in solvent octanol.
mdpow.fep   : INFO     Base directory is '/nfs/homes4/sfan/Projects/SAMPL7/gaff/SM25/SM25'
mdpow.fep   : INFO     Using setup directories under 'FEP/octanol': {'vdw': 'FEP/octanol/VDW', 'coulomb': 'FEP/octanol/Coulomb'}
mdpow.fep   : INFO     Default checkpoint file is '/nfs/homes4/sfan/Projects/SAMPL7/gaff/SM25/SM25/FEP/octanol/Goct.fep'
mdpow       : INFO     The solvent is octanol .
mdpow       : INFO     Analysis stride is 1 .
mdpow       : INFO     [SM25] Forcing (re)analysis of hydration free energy data
mdpow       : INFO     Estimator is alchemlyb .
mdpow       : INFO     Free energy calculation method is MBAR .
mdpow       : INFO     Analysis starts from frame 0 .
mdpow       : INFO     Analysis stops at frame None .
mdpow       : INFO     Statistial inefficiency analysis will be performed.
mdpow.fep   : INFO     [FEP/octanol] Finding dgdl xvg files, reading with stride=1 permissive=False.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 50.0223.
mdpow.fep   : INFO     The data are subsampled every 51 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 63.1531.
mdpow.fep   : INFO     The data are subsampled every 64 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 69.1584.
mdpow.fep   : INFO     The data are subsampled every 70 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 95.2881.
mdpow.fep   : INFO     The data are subsampled every 96 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 154.2010.
mdpow.fep   : INFO     The data are subsampled every 155 frames.
mdpow.fep   : INFO     Perform statistical inefficiency analysis.
mdpow.fep   : INFO     The statistical inefficiency value is 71.3375.

Thanks @VOD555 , it looks good, but can you also add before each SI analysis the window which is analyzed ? Thanks.

@iorga
Copy link
Collaborator

iorga commented Jun 4, 2021

positional arguments:
directory directory or directories which contain all the files resulting from running
mdpow.fep.Ghyd.setup()

optional arguments:
-h, --help show this help message and exit
--plotfile FILE plot dV/dlambda to FILE; use png or pdf suffix to determine the file type. 'auto' generates a pdf
file DIRECTORY/dVdl__.pdf and 'None' disables it The plot function is only
available for mdpow estimator,and is disabled when using alchemlyb estimators. (default: none)
--solvent NAME, -S NAME
solvent NAME for compound, 'water', 'octanol', or 'cyclohexane' (default: water)
-o FILE, --outfile FILE
append one-line results summary to FILE (default: gsolv.txt)
-e FILE, --energies FILE
append solvation free energies to FILE (default: energies.txt)
--estimator {mdpow,alchemlyb}
choose the estimator to be used, alchemlyb or mdpow estimators (default: alchemlyb)
--method {TI,MBAR,BAR}
choose the method to calculate free energy (default: MBAR)
--force force rereading all data (default: False)
--noSI Disable statistical inefficiency analysisStatitical inefficiency analysis is performed by default
when usingalchemlyb estimators and is disabled when using mdpow estimator. (default: True)
-s N, --stride N Use every N-th datapoint from the original dV/dlambda data. (default: 1)
--start START Start point for the data used from the original dV/dlambda data. (default: None)
--stop STOP Stop point for the data used from the original dV/dlambda data. (default: None)
--ignore-corrupted skip lines in the md.xvg files that cannot be parsed. WARNING: Other lines in the file might have
been corrupted in such a way that they appear correct but are in fact wrong. WRONG RESULTS CAN
OCCUR! USE AT YOUR OWN RISK (default: False)

@VOD555 Looks good. At the noSI option, please add dots at the end of sentences et correct the typo in Statitical.

@iorga
Copy link
Collaborator

iorga commented Jun 4, 2021

Thanks @VOD555 !

@orbeckst
Copy link
Member

orbeckst commented Jun 8, 2021

@VOD555 is this ready for another review?

@iorga if you have any comments could you please also do a review? You can also do a review that says "LGTM' (looks good to me) and approve. Thanks.

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 8, 2021

@orbeckst It's ready for review.
If the current mdpow-solvationenergy is ok, I can do the same changes of to mdpow-pow and mdpow-pcw (in this pr or another one?)

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes. Looking good but see comments

  1. The logic of the for G in (G1, G2) loop is broken and needs to be fixed — that's not your fault but it needs to be fixed and this PR is a good place.
  2. I'd like the option for SI to be called --SI and --no-SI. I hope @iorga won't be too upset. I'd like it to work like a proper bool option.
  3. Let the script fail right away if SI == True and estimator == "mdpow" just so that there's never any confusion of what is and isn't applied. If the user wants SI and then they need to use alchemlyb. This will work by default.

mdpow/fep.py Outdated
@@ -1366,17 +1372,24 @@ def p_transfer(G1, G2, **kwargs):

# for this version. use the method given instead of the one in the input cfg file
G.method = kwargs.pop('method', 'MBAR')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole loop logic needs to be fixed.

 for G in (G1, G2):
      G_kwargs = kwargs.copy()
       # then use G_kwargs.pop() etc in this loop.
       ...

At the moment, G2 gets an almost empty kwargs and uses defaults for everything.

scripts/mdpow-pow Outdated Show resolved Hide resolved
@@ -208,7 +208,7 @@ if __name__ == "__main__":
help="force rereading all data [%default]")
parser.add_option('--noSI', dest='SI',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename as --no-SI so that when we use argparse bool options, it has the same name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make things clear, also add

  parser.add_option('--SI', dest='SI', action="store_true", ...)

as the on switch. Set the default for SI to True.

scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
Comment on lines 269 to 271
if len(opts.directory) == 0:
logger.fatal("A directory is required. See --help.")
sys.exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be omitted thanks to narg+ — or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, narg+ can automatically raise an error when given no input.

@orbeckst
Copy link
Member

orbeckst commented Jun 8, 2021

Fix mdpow-solvationenergy here and then do a PR for the others. Smaller PRs are better!

@iorga
Copy link
Collaborator

iorga commented Jun 8, 2021

@VOD555 @orbeckst LGTM. I'm not upset about --SI, I very much prefer this way.

Once this PR is merged, I will test mdpow-solvationenergy and report back to you.

- changed `--noSI` to `--no-SI`
- added `--SI` (the default)
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Let the script fail right away if SI == True and estimator == "mdpow" just so that there's never any confusion of what is and isn't applied. If the user wants SI and then they need to use alchemlyb. This will work by default.
  • update CHANGES

(Add --SI / --no-SI to mdpow-pow in other PR).

mdpow/fep.py Outdated Show resolved Hide resolved
mdpow/fep.py Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
scripts/mdpow-solvationenergy Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member

orbeckst commented Jun 9, 2021

Add notes to CHANGES and do whatever check you can think of (run the script at least locally... test would be nice but I don't think we test any of the scripts :-p ). When you're happy, squash merge (or tell me to do the merge). Thank youm, @VOD555 !

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 9, 2021

Local tests show no problem with --SI and --no-SI.
I'm working on the CHANGES.

@VOD555
Copy link
Contributor Author

VOD555 commented Jun 9, 2021

I'm going to merge the PR

@VOD555 VOD555 merged commit 985f20f into Becksteinlab:develop Jun 9, 2021
@VOD555 VOD555 deleted the ghyd_solvationenergy branch December 6, 2022 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow mdpow other analysis scripts to select alchemlyb estimators
3 participants