Skip to content

Scripts related to building and analysing constant-pH MD simulations in Gromacs

Notifications You must be signed in to change notification settings

minghao2016/pHbuilder

Repository files navigation

Description

Python-based system builder for constant-pH simulations in Gromacs.

Install instructions

  1. If you have a GPU and want to use GPU-acceleration, make sure you first install CUDA.
  2. Clone clean-cpHMD-branch.
  3. Install using the instructions here. I personally use the following CMake flags: cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=CUDA -DGMX_CUDA_TARGET_COMPUTE=60 -DGMX_USE_RDTSCP=ON -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs_constantph
  4. Clone phbuilder (this) repository.
  5. Add phbuilder directory to PYTHONPATH by adding export PYTHONPATH=$PYTHONPATH:/path/to/phbuilder to your ~/.bashrc file (and reload terminal(s)).

Design principles

I hardly use classes for the developing this builder as I don't particularly like them in Python. Instead, I decided to use the shelve module, which allows for storing and loading python data structures in a binary file. Here, this binary file is called 'universe'. The idea is to just store any data structure that might be of value to other builder/analysis functions in this 'universe', whether it be a float or a long list of Residue objects.

To reduce coupling, one thing I think is very useful is the ability to run any command at any stage of the building process. If a function requires a variable that was not added to the universe yet (because some other builder function was skipped), the function will simply prompt the user to input the missing data instead of exiting (any data structure except for d_residues can be manually added this way).

All (encapsulated) gromacs commands executed by the various builder functions will be logged in the "builder.log" file (this is helpful for debugging purposes).

Reference list of builder functions
universe.add(varName, value)
Add variable to universe.
universe.has(varName)
Check whether variable with varName is in universe.
universe.get(varName)
Retrieve variable with varName from universe.
universe.inspect()
Print all the data members stored in universe. This is very useful for debugging or checking the parameters used for building the simulation in question.
protein.process(fname, d_model=1, d_ALI='A', d_chain=[], resetResId=False):
Processes a .pdb file. This should be the first step in building the simulation. fname is the name of the .pdb file, d_model is the MODEL number, d_ALI is the alternative location indicator, d_chain is a list of chains (default = all), and resetResId resets the residue numbering. Note that all atoms in the input protein MUST belong to a chain (must have a chain letter), otherwise the building process won't work.
protein.countres(resName)
Counts and returns the number of residues of a certain resName.
protein.add_box(d_boxMargin, d_boxType='cubic')
Add periodic box. This function encapsulates gmx editconf.
protein.add_buffer(ph_bufpdbName, ph_bufitpName, ph_bufMargin=2.0, ph_bufnmol=-1, attempts=100000)
Add buffer particle(s). This function encapsulates gmx insert-molecules. Will be skipped if either ph_constantpH or ph_restrainpH is False. Requires the structure (.pdb) file of the buffer particle as well as the topology (.itp) file. One can optionally set the minimum distance between the buffer particle(s) (themselves) and the protein, as well as the number of attempts. The number of buffer particles can also be set. By default, ph_bufnmol = #titratable groups.
protein.add_water()
Solvate the system. This function encapsulates gmx solvate.
add_ions(neutral=True, conc=0, pname='NA', nname='CL')
Neutralize the system and/or add a specified salt concentration (conc in mmol/ml) using pname and nname ion types (default NA and CL). This function encapsulates gmx genion.
topol.generate(d_modelFF, d_modelWater, d_terministring="", d_customitp="")
Generate the protein topology. This function encapsulates gmx pdb2gmx, and makes sure all protonatable residues are put in their protonated state when ph_constantpH is True. d_modelFF is the force field, while d_modelWater is the water model. d_terministring is a string consisting of two letters, specifying how pdb2gmx should treat the termini of the (various chains of) the protein. Commonly used: 00 = charged termini (pdb2gmx default), 11 = neutral termini, 34 = use this with capped tripeptide. d_customitp allows you to specify a custom .itp file to overwrite topol_Protein_chain_A.itp. This is a temporary fix for using our modified force field.
topol.restrain_dihedrals(resName, atomNameList, Type, phi, dphi, fc)
Restrain dihedrals of a certain residue. This is an optional step that should be run after topol.generate(). resName is the residue name, atomNameList is a list containing four strings, indicating with four atoms form the dihedral (e.g. [' OE1', ' CD ', ' OE2', ' HE2'] to restrain the carboxyl dihedral of GLU), and Type, phi, dphi, and fc are restraining parameters (see Gromacs manual).
topol.restrain_dihedrals_by_idx(indices, Type, phi, dphi, fc)
Restrain dihedral between atom indices. This is an optional step that should be run after topol.generate(). Type, phi, dphi, and fc are restraining parameters (see Gromacs manual).
md.gen_mdp(Type, nsteps=25000, nstxout=0, posres=False)
Generate .mdp file. Possible Types are: 'EM', 'NVT', 'NPT', 'MD'. If posres=True, the entire protein will be position restrained (not just COM). If you use energy_minimize(), tcouple(), and pcoule(), this function only needs to be called explicitly to generate MD.mdp.
md.gen_constantpH(ph_pH, ph_lambdaM, ph_nstout, ph_barrierE, cal=False, lambdaInit=0.5)
Generate constant-pH input parameters. Will be skipped if d_constantpH is False. This function currently works in four steps: 1) write general constant-pH parameters to MD.mdp, 2) write different lambda group types to MD.mdp, 3) write different lambda residues to MD.mdp, 4) write lambda groups and corresponding atom numbers to index.ndx. ph_pH is the pH of the simulation/buffer, ph_lambdaM is the mass of the lambda particle, ph_nstout is the output frequency for the lambda_xxx.dat files, ph_barrierE is the barrier energy in kJ/mol. cal and lambdaInit should be left alone unless you're doing a calibration.
md.energy_minimize()
Generate EM.mdp and perform energy minimization. Encapsulates gmx grompp and gmx mdrun.
md.energy_tcouple()
Generate NVT.mdp and perform temperature coupling. Encapsulates gmx grompp and gmx mdrun. Velocities are generated in this step, and it should not be skipped.
md.energy_pcouple()
Generate NPT.mdp and perform pressure coupling. Encapsulates gmx grompp and gmx mdrun. Can sometimes be skipped for small systems to save time.
write.run(gmxPath="/usr/local/gromacs", options="")
Write executable bash file called run.sh which calls gmx grompp and gmx mdrun for running the simulation. options can be used to specify a string for additional parameters for mdrun such as -pme cpu, which is currently required when using GPU acceleration.
write.reset()
Write executable bash file called reset.sh which can be used to clean up / delete all generate files. Proceed with caution!

Reference list of analysis functions
analyze.fitCalibration(order=5, compare=[])
Fit dV/dl coefficients and plot the fit. compare can be used to specify different dV/dl coefficients for a comparison.
analyze.glicphstates()
Analyze pH states in GLIC.
analyze.plotlambda(plotBUF=False)
Plot lambda trajectories of all lambda coordinates. Also plot buffer trajectory if plotBUF=True.
analyze.plotpotentials(pKa)
Plot the correction and pH potentials for given pKa. Uses lambda_dwp.dat for obtaining the bias potential.
analyze.plotforces(pKa)
Plot the correction and pH forces for given pKa. Uses lambda_dwp.dat for obtaining the bias potential.
analyze.plothistogram(fname, bins=200)
Plot histogram for a given lambda_xxx.dat file.

Reference list of data members used/stored in universe:

  • All data members have either a d_ or a ph_ prefix to distinguish them.
  • d_ means it's a general data member while ph_ means it's specific to some constant-pH functionality.
d_ALI Alternative Location Indicator in .pdb files. Default = 'A'.
d_box The (CRYST1) line a .pdb file defining the periodic box.
d_boxMargin Box size margin / option -d for gmx editconf.
d_chain List of different chains in protein in .pdb file.
d_dt MD time step size (ps). Currently hardcoded at 0.002 ps.
d_model The MODEL number that was specified when processing .pdb file. Default = 1.
d_modelFF Force field.
d_modelWater Water model.
d_nameList List containing succession of .pdb names when performing various steps. Usually a tool will take the last name contained in this list and us that as input.
d_nsteps Number of MD steps to perform during production (MD) run.
d_nstxout Interval in which to output a trajectory frame. E.g. d_nstxout=10000 means output a frame every d_dt * d_nstxout = 20ps.
d_pdbName Name of the input protein minus .pdb extention.
d_residues List of (all) Residue objects. Basically it's the entire .pdb file loaded into my own data structure.
d_terministring String consisting of two letters, specifying how gmx pdb2gmx should treat the termini of the (various chains of) the protein. Commonly used: 00 = charged termini (pdb2gmx default), 11 = neutral termini, 34 = use this with capped tripeptide.
d_title TITLE of .pdb file. If none was set, d_pdbName will be used.
ph_ASP_dvdl Contains dV/dl calibration coefficients for ASP. Has to be added manually to universe before you start to build the system.
ph_BUF_dvdl Containd dV/dl calibration coefficients for BUF. Has to be added manually to universe before you start to build the system.
ph_GLU_dvdl Containd dV/dl calibration coefficients for GLU. Has to be added manually to universe before you start to build the system.
ph_barrier_E Barrier energy (kJ/mol). Good value is 5.0 or 7.5 kJ/mol.
ph_bufMargin Minimum distance (nm) between the buffer molecule(s) (themselves) and the protein. This should be at least 1.5nm to avoid unphysical interactions.
ph_bufitpname Name of the topology file of the buffer molecule.
ph_bufnmol Number of buffer molecules.
ph_pdbname Name of the structure file of the buffer molecule.
ph_constantpH Turn constant-pH on or off. Has to be added manually to universe before you start to build the system.
ph_dvdl_initList List of lambda coordinates for which to compute mean dV/dl during calibration.
ph_dvdl_meanList List of mean dV/dl values at said points.
ph_dvdl_stdList List of standard deviations of dV/dl values at said points.
ph_lambdaM Mass (Da) of lambda particle. Good value is 5.0 Da.
ph_nstout Similar to d_nstxout. Interval in which to output a line of lambda data to the various lambda_x.dat files.
ph_pH pH of the simulation/buffer.
ph_restrainpH Turn the charge-leveling scheme "charge-restraining" on or off. Has to be added manually to universe before you start to build the system.

Known issues

  • You can currently use protein.add_buf() to add less buffer molecules than you have titratable groups, however for various reasons this is a bad idea.

To-do

  • Enable the charge leveling scheme termed "charge coupling", i.e. put charge on an "ion".
  • Enable multistate.

About

Scripts related to building and analysing constant-pH MD simulations in Gromacs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published