Skip to content

Tutorial: Martini 3 IDPs proteins

Chris Brasnett edited this page Oct 10, 2024 · 1 revision

this tutorial is under construction

Preliminaries

We can use the Martini 3 library in Polyply to generate topologies for disordered proteins from a sequence fasta file. These topologies have been adjusted from the default Martini 3 amino acid topologies to adjust protein-water interactions, and improve the bonded parameters for IDPs.

PLEASE NOTE: This is not designed to generate topologies for Martini 3 proteins with folded domains. If you are not sure you are working with an IDP, check your structure and sequence before using this tool. Metapredict may be a useful tool for checking the sequence in particular. If you have a multidomain protein, it is best to use Martinize2 as described in the Martini 3 Go model paper. In any significant disordered region, Martinize2 may be used to selectively apply disordered parameters to the correct regions. For more detail on how disordered domains can be handled in Martinize2, please see the documentation.

Please cite the preprint that describes this work.

Parameter generation

To begin, we need a fasta file with the IDP sequence. The fasta file must specify PROTEIN in the header for polyply to interpret it correctly. Here, we use the example of an artificial disordered protein, as designed by Dzuricky et al., with 10 octapeptide repeat units. We'll call the file WT10.fasta

> WT10 PROTEIN
SKGPGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGRGDSPYSGY

Once we have our disordered sequence, we can use the gen_params program of Polyply to generate the simulation input topology:

polyply gen_params -seqf WT10.fasta -name WT10 -o WT10.itp -lib martini3

Which will generate a topology file containing the parameters for the input protein.

Using IDP topologies

The Martini 3 approach to IDPs uses virtual Go sites along the backbone to effectively adjust the backbone-water interaction. Using the protocol above to generate parameters automatically introduces these virtual sites (along with other improved bonded interactions). For example, the first few residues of the topology for the WT10 IDP discussed above now read:

[ atoms ]
  1 Q5    1 SER BB   1    1
  2 TP1   1 SER SC1  1  0.0
  3 VS    1 SER CA   1  0.0 0.0
  4 P2    2 LYS BB   2  0.0
  5 SC3   2 LYS SC1  2  0.0
  6 SQ4p  2 LYS SC2  2  1.0
  7 VS    2 LYS CA   2  0.0 0.0
  8 SP1   3 GLY BB   3  0.0
  9 VS    3 GLY CA   3  0.0 0.0
 10 SP2a  4 PRO BB   4  0.0
 11 SC3   4 PRO SC1  4  0.0
 12 VS    4 PRO CA   4  0.0 0.0
...

where an atom called CA of type VS has been introduced into each residue. Before using the input files generated with this method you must ensure that in your main itp file:

  1. VS is defined in your [ atomtypes ] directive, e.g.:
...
[ atomtypes ]
...
TX1er 36.0 0.000 A 0.0 0.0
W  72.0 0.000 A 0.0 0.0
SW 54.0 0.000 A 0.0 0.0
TW 36.0 0.000 A 0.0 0.0
U  24.0 0.000 A 0.0 0.0
VS 0.00 0.000 V 0.0 0.0

[ nonbond_params ]
    P6    P6  1 4.700000e-01    4.990000e+00
    P6    P5  1 4.700000e-01    4.730000e+00
    P6    P4  1 4.700000e-01    4.480000e+00
...
  1. An interaction is defined between VS and W in your [ nonbond_params ] directive, e.g.:
...
 TX2er  SQ1n  1 3.660000e-01    3.528000e+00
 TX2er  TQ1n  1 3.520000e-01    5.158000e+00
 TX1er   Q1n  1 3.950000e-01    1.981000e+00
 TX1er  SQ1n  1 3.780000e-01    3.098000e+00
 TX1er  TQ1n  1 3.660000e-01    4.422000e+00
    VS    W   1 0.4650000000    0.5000000000

The suggested parameters for the latter from the Go Martini 3 paper are $\sigma$ = 0.465 and $\epsilon$ = 0.5, representing an increase in the strength of the protein-water interaction of around 10%. However, if you find your IDP does not perform well with these parameters, then the value of $\epsilon$ can be readily adjusted.

Once these additional parameters have been included in the input force field files, the IDP topologies can be used as with any other input files for preparing simulations with Polyply or running them with Gromacs.

Protein modifications

As of Polyply v1.X.X, polyply gen_params supports modifications of protein. Modification syntax is :. For example:

polyply gen_params -lib martini3 -seq GLY:10 -name pGLY -o pGLY.itp -mods GLY1:N-ter GLY10:C-ter

will generate the parameters for polyglycine with 10 residues, with N and C termini at neutral pH. Note that these terminal modifications are applied automatically when polyply determines that the input sequence is a protein, so the same topology would be achieved with:

polyply gen_params -lib martini3 -seq GLY:10 -name pGLY -o pGLY.itp

In addition to terminal modifications, many of the usual protein modifications available in Martinize2 are available, and may be combined however is desired. For example:

polyply gen_params -lib martini3 -seq HIS:5 -name HIS5_mods -o HIS5_mods.itp -mods HIS1:HIS-HD HIS1:NH2-ter

generates a histidine pentapeptide with a neutralised N terminal, and with the same histidine side chain mutated to be representative of neutral histidine with hydrogen on the delta carbon.