Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InSty - changes in calcStatisticsInteractions #1948

Merged
merged 31 commits into from
Sep 12, 2024
Merged
Changes from 15 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
66786ad
Energy optional in calcStatisticsInteractions() [InSty]
karolamik13 Sep 3, 2024
8ffaed3
Merge branch 'interactions' of github.com:karolamik13/ProDy into inte…
karolamik13 Sep 3, 2024
86714f4
HIS/HSD/HSP added to tabulated_energies.txt; bug fix in get_energy()
karolamik13 Sep 4, 2024
dfc9205
Merge branch 'prody:main' into interactions
karolamik13 Sep 4, 2024
3c7fde9
reference/energy added [InSty]
karolamik13 Sep 4, 2024
285ae4d
Merge branch 'interactions' of github.com:karolamik13/ProDy into inte…
karolamik13 Sep 4, 2024
1e16368
HSE/HSD/HSP removed from tabulated_energies.txt and get_energy() is m…
karolamik13 Sep 4, 2024
d5c64c2
new function - checkNonstandardResidues()
karolamik13 Sep 4, 2024
0263b04
check for checkNonstandardResidues()
karolamik13 Sep 4, 2024
e22ff3d
More info in docs about energies
karolamik13 Sep 5, 2024
75fbd58
more info in docs InSty
karolamik13 Sep 5, 2024
1e15ef9
improvement in the non-standard residues in get_energy
karolamik13 Sep 5, 2024
9f2484b
new function saveInteractionsAsDummyAtoms() is added
karolamik13 Sep 7, 2024
def37bb
saveInteractionsAsDummyAtoms fix with kwargs
karolamik13 Sep 8, 2024
6277f1d
improvements of checkNonstandardResidues()
karolamik13 Sep 9, 2024
286ceb8
getInteractions - replace option for trajectory
karolamik13 Sep 10, 2024
3a06083
getInteractions - replace for a single PDB interactions
karolamik13 Sep 10, 2024
d86f7b8
getTimeInteractions() improvement to include selection replacement
karolamik13 Sep 10, 2024
4b5d687
showFrequentInteractors() [InSty] improvement - return dict with resi…
karolamik13 Sep 10, 2024
d5de489
addMissingAtoms() [fixer.py] - keep_ids added
karolamik13 Sep 11, 2024
6c10c46
typo in addMissingAtoms
karolamik13 Sep 11, 2024
cf86d94
typo found by James
karolamik13 Sep 11, 2024
11738ad
InSty - typos and TypeError fixed [James comments]
karolamik13 Sep 11, 2024
95a9a0d
InSty [docs and checks improvements]
karolamik13 Sep 12, 2024
860c7b3
fix typos
jamesmkrieger Sep 12, 2024
0ffa953
fix replace type
jamesmkrieger Sep 12, 2024
e1f4c6f
fix pkg_resources for PY3K
jamesmkrieger Sep 12, 2024
370e561
fix checkNonstandardResidues
jamesmkrieger Sep 12, 2024
e0f4d10
rename putDUMatom
jamesmkrieger Sep 12, 2024
670db54
add and for readability
jamesmkrieger Sep 12, 2024
2db6ea9
another typo fix
jamesmkrieger Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 204 additions & 17 deletions prody/proteins/interactions.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@
'calcHydrogenBondsTrajectory', 'calcHydrophobicOverlapingAreas',
'Interactions', 'InteractionsTrajectory', 'LigandInteractionsTrajectory',
'calcSminaBindingAffinity', 'calcSminaPerAtomInteractions', 'calcSminaTermValues',
'showSminaTermValues', 'showPairEnergy']
'showSminaTermValues', 'showPairEnergy', 'checkNonstandardResidues',
'saveInteractionsAsDummyAtoms']


def cleanNumbers(listContacts):
Expand Down Expand Up @@ -160,17 +161,65 @@ def get_energy(pair, source):
import numpy as np
import importlib.resources as pkg_resources
jamesmkrieger marked this conversation as resolved.
Show resolved Hide resolved

aa_correction = {
# Histidine (His)
'HSD': 'HIS', # NAMD, protonated at ND1 (HID in AMBER)
'HSE': 'HIS', # NAMD, protonated at NE2 (HIE in AMBER)
'HSP': 'HIS', # NAMD, doubly protonated (HIP in AMBER)
'HID': 'HIS', # AMBER name, protonated at ND1
'HIE': 'HIS', # AMBER name, protonated at NE2
'HIP': 'HIS', # AMBER name, doubly protonated
'HISD': 'HIS', # GROMACS: protonated at ND1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get these with long_resname=True:

In [6]: ag = parsePDB('nvt1_nojump_end.pdb', long_resname=True)

In [8]: np.unique(ag.getResnames())
Out[8]: 
array(['ALA', 'ARG', 'ASN', 'ASP', 'CLA', 'CU2P', 'CYS', 'GLN', 'GLU',
       'GLY', 'HISD', 'HISE', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO',
       'SER', 'SOD', 'THR', 'TRP', 'TYR', 'VAL'], dtype='<U6')

'HISE': 'HIS', # GROMACS: protonated at NE2
'HISP': 'HIS', # GROMACS: doubly protonated

# Cysteine (Cys)
'CYX': 'CYS', # Cystine (disulfide bridge)
'CYM': 'CYS', # Deprotonated cysteine, anion

# Aspartic acid (Asp)
'ASH': 'ASP', # Protonated Asp
'ASPP': 'ASP',

# Glutamic acid (Glu)
'GLH': 'GLU', # Protonated Glu
'GLUP': 'GLU', # Protonated Glu

# Lysine (Lys)
'LYN': 'LYS', # Deprotonated lysine (nautral)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here's another typo that I'll fix


# Arginine (Arg)
'ARN': 'ARG', # Deprotonated arginine (rare, GROMACS)

# Tyrosine (Tyr)
'TYM': 'TYR', # Deprotonated tyrosine (GROMACS)

# Serine (Ser)
'SEP': 'SER', # Phosphorylated serine (GROMACS/AMBER)

# Threonine (Thr)
'TPO': 'THR', # Phosphorylated threonine (GROMACS/AMBER)

# Tyrosine (Tyr)
'PTR': 'TYR', # Phosphorylated tyrosine (GROMACS/AMBER)

# Non-standard names for aspartic and glutamic acids in low pH environments
'ASH': 'ASP', # Protonated Asp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also confirm that I've seen one of these recently too

'GLH': 'GLU', # Protonated Glu
}

pair = [aa_correction.get(aa, aa) for aa in pair]

try:
jamesmkrieger marked this conversation as resolved.
Show resolved Hide resolved
# Python 3
with pkg_resources.path('prody.proteins', 'tabulated_energies.txt') as file_path:
data = np.loadtxt(file_path, skiprows=1, dtype=str)
data = np.loadtxt(file_path, dtype=str)
except:
# Python 2.7
import pkg_resources
file_path = pkg_resources.resource_filename('prody.proteins', 'tabulated_energies.txt')
with open(file_path) as f:
data = np.loadtxt(f, skiprows=1, dtype=str)

data = np.loadtxt(f, dtype=str)

sources = ["IB_nosolv", "IB_solv", "CS"]
aa_pairs = []
Expand All @@ -180,21 +229,69 @@ def get_energy(pair, source):

lookup = pair[0]+pair[1]

return data[np.where(np.array(aa_pairs)==lookup)[0]][0][2:][np.where(np.array(sources)==source)][0]
try:
data_results = data[np.where(np.array(aa_pairs)==lookup)[0]][0][2:][np.where(np.array(sources)==source)][0]
except ImportError:
karolamik13 marked this conversation as resolved.
Show resolved Hide resolved
raise ImportError('Please replace non-standard names of residues with standard names.')

return data_results


def checkNonstandardResidues(atoms):
jamesmkrieger marked this conversation as resolved.
Show resolved Hide resolved
"""Check whether the atomic structure contain non-standard residues and inform to replace the name
to the standard one to be that non-standard residues are treated in a correct way while computing
interactions.

:arg atoms: an Atomic object from which residues are selected
:type atoms: :class:`.Atomic`
"""

try:
coords = (atoms._getCoords() if hasattr(atoms, '_getCoords') else
atoms.getCoords())
except AttributeError:
try:
checkCoords(coords)
except TypeError:
raise TypeError('coords must be an object '
'with `getCoords` method')

amino_acids = ["ALA", "ARG", "ASN", "ASP", "CYS", "GLU", "GLN", "GLY", "HIS", "ILE",
"LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL"]

aa_list = atoms.select('name CA').getResnames()
aa_list_nr = atoms.select('name CA').getResnums()
nonstandard = []

for nr_i,i in enumerate(aa_list):
if i not in amino_acids:
nonstandard.append(aa_list[nr_i] + str(aa_list_nr[nr_i]))

LOGGER.info('There are several non-standard residues in the structure.')
LOGGER.info('Replace the non-standard name in the PDB file with the equivalent name from the standard one if you want to include them in the interactions.')
LOGGER.info("Residues: {0}.".format(' '.join(nonstandard)))


def showPairEnergy(data, **kwargs):
"""Return energies when a list of interactions is given. Energies will be added to each pair of residues
at the last position in the list. Energy is based on the residue types and not on the distances.
The unit of energy is kcal/mol. The energies defined as 'IB_nosolv', 'IB_solv' are taken from XX and
'CS' from YY.
The unit of energy is kcal/mol. The energies defined as 'IB_nosolv' (non-solvent-mediated), 'IB_solv' (solvent-mediated)
are taken from [OK98]_ and 'CS' from InSty paper (under preparation).
Protonation of resiudues is not distinguished. The protonation of residues is not distinguished.
karolamik13 marked this conversation as resolved.
Show resolved Hide resolved
Known residues such as HSD, HSE, HIE, and HID (used in MD simulations) are treated as HIS.

:arg data: list with interactions from calcHydrogenBonds() or other types
:type data: list

:arg energy_list_type: name of the list with energies
default is 'IB_solv'
:type energy_list_type: 'IB_nosolv', 'IB_solv', 'CS'


.. [OK98] Keskin O., Bahar I., Badretdinov A.Y., Ptitsyn O.B., Jernigan R.L.,
Empirical solvet-mediated potentials hold for both intra-molecular and
inter-molecular inter-residues interactions,
*Protein Science* **1998** 7: 2578–2586.
"""

if not isinstance(data, list):
Expand Down Expand Up @@ -1949,8 +2046,12 @@ def showInteractionsGraph(statistics, **kwargs):
def calcStatisticsInteractions(data, **kwargs):
"""Return the statistics of interactions from PDB Ensemble or trajectory including:
(1) the weight for each residue pair: corresponds to the number of counts divided by the
number of frames (values >1 are obtained when residue pair creates multiple contacts);
(2) average distance of interactions for each pair [in Ang] and (3) standard deviation [Ang.].
number of frames (values >1 are obtained when the residue pair creates multiple contacts);
(2) average distance of interactions for each pair [in Ang], (3) standard deviation [Ang.],
karolamik13 marked this conversation as resolved.
Show resolved Hide resolved
(4) Energy [in kcal/mol] that is not distance dependent. Energy by default is solvent-mediated
from [OK98]_ ('IB_solv'). To use non-solvent-mediated entries ('IB_nosolv') from [OK98]_ or
solvent-mediated values obtained for InSty paper ('CS', under preparation) change
`energy_list_type` parameter.

:arg data: list with interactions from calcHydrogenBondsTrajectory() or other types
:type data: list
Expand Down Expand Up @@ -1988,12 +2089,21 @@ def calcStatisticsInteractions(data, **kwargs):
for element in elements:
if element not in stats:
values = [t[1] for t in interactions_list if t[0] == element]
stats[element] = {
"stddev": np.round(np.std(values),6),
"mean": np.round(np.mean(values),6),
"weight": np.round(float(len(values))/len(data), 6),
"energy": get_energy([element.split('-')[0][:3], element.split('-')[1][:3]], energy_list_type)
}

try:
stats[element] = {
"stddev": np.round(np.std(values),6),
"mean": np.round(np.mean(values),6),
"weight": np.round(float(len(values))/len(data), 6),
"energy": get_energy([element.split('-')[0][:3], element.split('-')[1][:3]], energy_list_type)
}
except:
karolamik13 marked this conversation as resolved.
Show resolved Hide resolved
LOGGER.warn('energy information is not available for ', element.split('-')[0][:3], element.split('-')[1][:3])
karolamik13 marked this conversation as resolved.
Show resolved Hide resolved
stats[element] = {
"stddev": np.round(np.std(values),6),
"mean": np.round(np.mean(values),6),
"weight": np.round(float(len(values))/len(data), 6)
}

statistic = []
for key, value in stats.items():
Expand All @@ -2002,8 +2112,11 @@ def calcStatisticsInteractions(data, **kwargs):
LOGGER.info(" Average [Ang.]: {}".format(value['mean']))
LOGGER.info(" Standard deviation [Ang.]: {0}".format(value['stddev']))
LOGGER.info(" Weight: {0}".format(value['weight']))
LOGGER.info(" Energy [kcal/mol]: {0}".format(value['energy']))
statistic.append([key, value['weight'], value['mean'], value['stddev'], value['energy']])
try:
LOGGER.info(" Energy [kcal/mol]: {0}".format(value['energy']))
statistic.append([key, value['weight'], value['mean'], value['stddev'], value['energy']])
except:
statistic.append([key, value['weight'], value['mean'], value['stddev']])
else: pass

statistic.sort(key=lambda x: x[1], reverse=True)
Expand Down Expand Up @@ -2087,6 +2200,80 @@ def calcDistribution(interactions, residue1, residue2=None, **kwargs):
LOGGER.info(i)


def saveInteractionsAsDummyAtoms(atoms, interactions, filename, **kwargs):
'''Creates a PDB file which will contain protein structure and dummy atoms that will be placed between pairs
of interacting residues.

:arg atoms: an Atomic object from which residues are selected
:type atoms: :class:`.Atomic`

:arg interactions: list of interactions
:type interactions: list

:arg filename: name of the PDB file which will contain dummy atoms and protein structure
:type filename: str

:arg RESNAME_dummy: resname of the dummy atom, use 3-letter name
be default is 'DUM'
:type RESNAME_dummy: str '''


try:
coords = (atoms._getCoords() if hasattr(atoms, '_getCoords') else
atoms.getCoords())
except AttributeError:
try:
checkCoords(coords)
except TypeError:
raise TypeError('coords must be an object '
'with `getCoords` method')

RESNAME_dummy = kwargs.pop('RESNAME_dummy', 'DUM')

def putDUMatom(coord1, coord2):
jamesmkrieger marked this conversation as resolved.
Show resolved Hide resolved
midpoint = [
(coord1[0] + coord2[0]) / 2,
(coord1[1] + coord2[1]) / 2,
(coord1[2] + coord2[2]) / 2
]
return midpoint

all_DUMs = []
atoms_ = atoms.copy()

for i in interactions:
if len(i[1].split('_')) <= 3:
res1_name = 'chain '+i[2]+' resname '+i[0][:3]+' and resid '+i[0][3:]+' and index '+' '.join(i[1].split('_')[1:])
jamesmkrieger marked this conversation as resolved.
Show resolved Hide resolved
res1_coords = calcCenter(atoms.select(res1_name))

if len(i[1].split('_')) > 3:
res1_name = 'chain '+i[2]+' resname '+i[0][:3]+' and resid '+i[0][3:]+' and index '+' '.join(i[1].split('_'))
res1_coords = calcCenter(atoms.select(res1_name))

if len(i[4].split('_')) <= 3:
res2_name = 'chain '+i[5]+' resname '+i[3][:3]+' and resid '+i[3][3:]+' and index '+' '.join(i[4].split('_')[1:])
res2_coords = calcCenter(atoms.select(res2_name))

if len(i[4].split('_')) > 3:
res2_name = 'chain '+i[5]+' resname '+i[3][:3]+' and resid '+i[3][3:]+' and index '+' '.join(i[4].split('_'))
res2_coords = calcCenter(atoms.select(res2_name))

all_DUMs.append(putDUMatom(res1_coords, res2_coords))

if all_DUMs == []:
LOGGER.info('Lack of interactions')
else:
LOGGER.info('Creating file with dummy atoms')
dummyAtoms = AtomGroup()
coords = array([all_DUMs], dtype=float)
dummyAtoms.setCoords(coords)
dummyAtoms.setNames([RESNAME_dummy]*len(dummyAtoms))
dummyAtoms.setResnums(range(1, len(dummyAtoms)+1))
dummyAtoms.setResnames([RESNAME_dummy]*len(dummyAtoms))

writePDB(filename, atoms_+dummyAtoms)


def listLigandInteractions(PLIP_output, **kwargs):
"""Create a list of interactions from PLIP output created using calcLigandInteractions().
Results can be displayed in VMD.
Expand Down
Loading