GitHub - minghao2016/pHbuilder: Scripts related to building and analysing constant-pH MD simulations in Gromacs

Description

Python-based system builder for constant-pH simulations in Gromacs.

Install instructions

If you have a GPU and want to use GPU-acceleration, make sure you first install CUDA.
Clone clean-cpHMD-branch.
Install using the instructions here. I personally use the following CMake flags: cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=CUDA -DGMX_CUDA_TARGET_COMPUTE=60 -DGMX_USE_RDTSCP=ON -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs_constantph
Clone phbuilder (this) repository.
Add phbuilder directory to PYTHONPATH by adding export PYTHONPATH=$PYTHONPATH:/path/to/phbuilder to your ~/.bashrc file (and reload terminal(s)).

Design principles

I hardly use classes for the developing this builder as I don't particularly like them in Python. Instead, I decided to use the shelve module, which allows for storing and loading python data structures in a binary file. Here, this binary file is called 'universe'. The idea is to just store any data structure that might be of value to other builder/analysis functions in this 'universe', whether it be a float or a long list of Residue objects.

To reduce coupling, one thing I think is very useful is the ability to run any command at any stage of the building process. If a function requires a variable that was not added to the universe yet (because some other builder function was skipped), the function will simply prompt the user to input the missing data instead of exiting (any data structure except for d_residues can be manually added this way).

All (encapsulated) gromacs commands executed by the various builder functions will be logged in the "builder.log" file (this is helpful for debugging purposes).

Reference list of builder functions
`universe.add(varName, value)` Add variable to universe.
`universe.has(varName)` Check whether variable with varName is in universe.
`universe.get(varName)` Retrieve variable with varName from universe.
`universe.inspect()` Print all the data members stored in universe. This is very useful for debugging or checking the parameters used for building the simulation in question.
`protein.process(fname, d_model=1, d_ALI='A', d_chain=[], resetResId=False):` Processes a .pdb file. This should be the first step in building the simulation. `fname` is the name of the .pdb file, `d_model` is the MODEL number, `d_ALI` is the alternative location indicator, `d_chain` is a list of chains (default = all), and `resetResId` resets the residue numbering. Note that all atoms in the input protein MUST belong to a chain (must have a chain letter), otherwise the building process won't work.
`protein.countres(resName)` Counts and returns the number of residues of a certain `resName`.
`protein.add_box(d_boxMargin, d_boxType='cubic')` Add periodic box. This function encapsulates `gmx editconf`.
`protein.add_buffer(ph_bufpdbName, ph_bufitpName, ph_bufMargin=2.0, ph_bufnmol=-1, attempts=100000)` Add buffer particle(s). This function encapsulates `gmx insert-molecules`. Will be skipped if either `ph_constantpH` or `ph_restrainpH` is False. Requires the structure (.pdb) file of the buffer particle as well as the topology (.itp) file. One can optionally set the minimum distance between the buffer particle(s) (themselves) and the protein, as well as the number of attempts. The number of buffer particles can also be set. By default, `ph_bufnmol` = #titratable groups.
`protein.add_water()` Solvate the system. This function encapsulates `gmx solvate`.
`add_ions(neutral=True, conc=0, pname='NA', nname='CL')` Neutralize the system and/or add a specified salt concentration (`conc` in mmol/ml) using `pname` and `nname` ion types (default NA and CL). This function encapsulates `gmx genion`.
`topol.generate(d_modelFF, d_modelWater, d_terministring="", d_customitp="")` Generate the protein topology. This function encapsulates `gmx pdb2gmx`, and makes sure all protonatable residues are put in their protonated state when `ph_constantpH` is True. `d_modelFF` is the force field, while `d_modelWater` is the water model. `d_terministring` is a string consisting of two letters, specifying how `pdb2gmx` should treat the termini of the (various chains of) the protein. Commonly used: 00 = charged termini (`pdb2gmx` default), 11 = neutral termini, 34 = use this with capped tripeptide. `d_customitp` allows you to specify a custom .itp file to overwrite topol_Protein_chain_A.itp. This is a temporary fix for using our modified force field.
`topol.restrain_dihedrals(resName, atomNameList, Type, phi, dphi, fc)` Restrain dihedrals of a certain residue. This is an optional step that should be run after `topol.generate()`. `resName` is the residue name, `atomNameList` is a list containing four strings, indicating with four atoms form the dihedral (e.g. `[' OE1', ' CD ', ' OE2', ' HE2']` to restrain the carboxyl dihedral of GLU), and `Type`, `phi`, `dphi`, and `fc` are restraining parameters (see Gromacs manual).
`topol.restrain_dihedrals_by_idx(indices, Type, phi, dphi, fc)` Restrain dihedral between atom indices. This is an optional step that should be run after `topol.generate()`. `Type`, `phi`, `dphi`, and `fc` are restraining parameters (see Gromacs manual).
`md.gen_mdp(Type, nsteps=25000, nstxout=0, posres=False)` Generate .mdp file. Possible Types are: 'EM', 'NVT', 'NPT', 'MD'. If `posres=True`, the entire protein will be position restrained (not just COM). If you use energy_minimize(), tcouple(), and pcoule(), this function only needs to be called explicitly to generate MD.mdp.
`md.gen_constantpH(ph_pH, ph_lambdaM, ph_nstout, ph_barrierE, cal=False, lambdaInit=0.5)` Generate constant-pH input parameters. Will be skipped if `d_constantpH` is False. This function currently works in four steps: 1) write general constant-pH parameters to MD.mdp, 2) write different lambda group types to MD.mdp, 3) write different lambda residues to MD.mdp, 4) write lambda groups and corresponding atom numbers to index.ndx. `ph_pH` is the pH of the simulation/buffer, `ph_lambdaM` is the mass of the lambda particle, `ph_nstout` is the output frequency for the lambda_xxx.dat files, `ph_barrierE` is the barrier energy in kJ/mol. `cal` and `lambdaInit` should be left alone unless you're doing a calibration.
`md.energy_minimize()` Generate EM.mdp and perform energy minimization. Encapsulates `gmx grompp` and `gmx mdrun`.
`md.energy_tcouple()` Generate NVT.mdp and perform temperature coupling. Encapsulates `gmx grompp` and `gmx mdrun`. Velocities are generated in this step, and it should not be skipped.
`md.energy_pcouple()` Generate NPT.mdp and perform pressure coupling. Encapsulates `gmx grompp` and `gmx mdrun`. Can sometimes be skipped for small systems to save time.
`write.run(gmxPath="/usr/local/gromacs", options="")` Write executable bash file called run.sh which calls `gmx grompp` and `gmx mdrun` for running the simulation. `options` can be used to specify a string for additional parameters for mdrun such as `-pme cpu`, which is currently required when using GPU acceleration.
`write.reset()` Write executable bash file called reset.sh which can be used to clean up / delete all generate files. Proceed with caution!

Reference list of analysis functions
`analyze.fitCalibration(order=5, compare=[])` Fit dV/dl coefficients and plot the fit. `compare` can be used to specify different dV/dl coefficients for a comparison.
`analyze.glicphstates()` Analyze pH states in GLIC.
`analyze.plotlambda(plotBUF=False)` Plot lambda trajectories of all lambda coordinates. Also plot buffer trajectory if `plotBUF=True`.
`analyze.plotpotentials(pKa)` Plot the correction and pH potentials for given pKa. Uses lambda_dwp.dat for obtaining the bias potential.
`analyze.plotforces(pKa)` Plot the correction and pH forces for given pKa. Uses lambda_dwp.dat for obtaining the bias potential.
`analyze.plothistogram(fname, bins=200)` Plot histogram for a given lambda_xxx.dat file.

Reference list of data members used/stored in universe:

All data members have either a d_ or a ph_ prefix to distinguish them.
d_ means it's a general data member while ph_ means it's specific to some constant-pH functionality.

d_ALI	Alternative Location Indicator in .pdb files. Default = 'A'.
d_box	The (CRYST1) line a .pdb file defining the periodic box.
d_boxMargin	Box size margin / option -d for gmx editconf.
d_chain	List of different chains in protein in .pdb file.
d_dt	MD time step size (ps). Currently hardcoded at 0.002 ps.
d_model	The MODEL number that was specified when processing .pdb file. Default = 1.
d_modelFF	Force field.
d_modelWater	Water model.
d_nameList	List containing succession of .pdb names when performing various steps. Usually a tool will take the last name contained in this list and us that as input.
d_nsteps	Number of MD steps to perform during production (MD) run.
d_nstxout	Interval in which to output a trajectory frame. E.g. d_nstxout=10000 means output a frame every d_dt * d_nstxout = 20ps.
d_pdbName	Name of the input protein minus .pdb extention.
d_residues	List of (all) Residue objects. Basically it's the entire .pdb file loaded into my own data structure.
d_terministring	String consisting of two letters, specifying how gmx pdb2gmx should treat the termini of the (various chains of) the protein. Commonly used: 00 = charged termini (pdb2gmx default), 11 = neutral termini, 34 = use this with capped tripeptide.
d_title	TITLE of .pdb file. If none was set, d_pdbName will be used.
ph_ASP_dvdl	Contains dV/dl calibration coefficients for ASP. Has to be added manually to universe before you start to build the system.
ph_BUF_dvdl	Containd dV/dl calibration coefficients for BUF. Has to be added manually to universe before you start to build the system.
ph_GLU_dvdl	Containd dV/dl calibration coefficients for GLU. Has to be added manually to universe before you start to build the system.
ph_barrier_E	Barrier energy (kJ/mol). Good value is 5.0 or 7.5 kJ/mol.
ph_bufMargin	Minimum distance (nm) between the buffer molecule(s) (themselves) and the protein. This should be at least 1.5nm to avoid unphysical interactions.
ph_bufitpname	Name of the topology file of the buffer molecule.
ph_bufnmol	Number of buffer molecules.
ph_pdbname	Name of the structure file of the buffer molecule.
ph_constantpH	Turn constant-pH on or off. Has to be added manually to universe before you start to build the system.
ph_dvdl_initList	List of lambda coordinates for which to compute mean dV/dl during calibration.
ph_dvdl_meanList	List of mean dV/dl values at said points.
ph_dvdl_stdList	List of standard deviations of dV/dl values at said points.
ph_lambdaM	Mass (Da) of lambda particle. Good value is 5.0 Da.
ph_nstout	Similar to d_nstxout. Interval in which to output a line of lambda data to the various lambda_x.dat files.
ph_pH	pH of the simulation/buffer.
ph_restrainpH	Turn the charge-leveling scheme "charge-restraining" on or off. Has to be added manually to universe before you start to build the system.

Known issues

You can currently use protein.add_buf() to add less buffer molecules than you have titratable groups, however for various reasons this is a bad idea.

To-do

Enable the charge leveling scheme termed "charge coupling", i.e. put charge on an "ion".
Enable multistate.

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
examples		examples
misc		misc
proteins		proteins
README.md		README.md
analyze.py		analyze.py
constantph.py		constantph.py
load.py		load.py
md.py		md.py
mdp.py		mdp.py
phbuilder.py		phbuilder.py
protein.py		protein.py
topol.py		topol.py
universe.py		universe.py
utils.py		utils.py
write.py		write.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

minghao2016/pHbuilder

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages