Swap asym ids #1773

jamesmkrieger · 2023-10-12T19:13:14Z

Response to #1772

Other fixes include:

trimming away semicolons on the edges of mmcif entries
adjusting unite_chains (copying larger units stored in segnames to chids) to handle header and biomols
separating copy into another step in buildBiomolecules in case the selection is None
removing relabelling of segments based on biomolecule number, which is unneeded and interferes with unite_chains

jamesmkrieger · 2023-10-12T19:14:18Z

Also needs tests

3o21 seems like a good system for this as it has 2 clear biomols (dimers) and is generally not that big

jamesmkrieger · 2023-10-12T19:22:49Z

@dkoes, this and #1771 together fix the issue with 2VVJ

jamesmkrieger · 2023-10-12T21:46:33Z

@dkoes, any comments on this or do you think it's ok?

jamesmkrieger · 2023-10-12T21:49:10Z

@dkoes, this and #1771 together fix the issue with 2VVJ

this has been included here

jamesmkrieger · 2023-10-13T08:59:26Z

Actually we need to update segments when there are duplicates. 3h5v is a good example of this with chain C getting copied in a dimer and tetramer

dkoes

I really am not familiar enough with CIF to understand what is going on or why it fixes 2VVJ.. but if it does, great!

prody/proteins/ciffile.py

jamesmkrieger · 2023-10-13T22:17:59Z

Ok. Thanks

prody/proteins/header.py

dkoes · 2023-10-14T14:47:29Z

Okay, I'm still trying to figure out what is going on with the code. The overall issue is some PDBs have an "auth" chain id of "bb" (specifically, these PDBs: 7M57, 7VRT, 7ZAS, 7ATA, 5EXC, 5BKN, 7M2T, 5BKL, 7VS5, 7M3T, 8H2I, 7ANM,7M50).

Taking 7ZAS as an example (because it is relatively small), if I do

p,h = prody.parsePDB('7ZAS',header=True,biomol=True)

I get:
[<AtomGroup: 7ZAS biomolecule 1 (6488 atoms)>, <AtomGroup: 7ZAS biomolecule 2 (6575 atoms)>]

These are the correct bioassemblys (I wrote them out and compared to what is in PDB).
The header biomoltrans has replaced the auth chain ids with whatever the non-auth ids are called:

{'1': [['A', 'B', 'G', 'H', 'I', 'L', 'M', 'N', 'S', 'T'],
  '1.0000000000 0.0000000000 0.0000000000 0.0000000000',
  '0.0000000000 1.0000000000 0.0000000000 0.0000000000',
  '0.0000000000 0.0000000000 1.0000000000 0.0000000000'],
 '2': [['C', 'D', 'J', 'O', 'P'],
  '1.0000000000 0.0000000000 0.0000000000 0.0000000000',
  '0.0000000000 1.0000000000 0.0000000000 0.0000000000',
  '0.0000000000 0.0000000000 1.0000000000 0.0000000000',
  ['E', 'F', 'K', 'Q', 'R'],
  '-1.0000000000 0.0000000000 0.0000000000 65.6990000000',
  '0.0000000000 1.0000000000 0.0000000000 69.2070000000',
  '0.0000000000 0.0000000000 -1.0000000000 0.0000000000']}

However, the AtomGroups still have the auth chain ids:

set(p[0].getChids()),set(p[1].getChids())

({'A', 'D', 'aa', 'dd'}, {'B', 'C', 'bb', 'cc'})

I have yet to figure out how the bioassemblies were correctly created.

If you build the bioassemblies manually:

p,h = prody.parseMMCIF('7ZAS',header=True)
bm = prody.buildBiomolecules(h,p)

You get something different:
[<AtomGroup: 7ZAS biomolecule 1 (5913 atoms)>, <AtomGroup: 7ZAS biomolecule 2 (5930 atoms)>]
Note the different numbers of atoms. These are not correct (don't match what is in the PDB).

If you rename the bb chain you can also get the correct assemblies:

m,hm = prody.parseMMTF('7ZAS',header=True)
chs = p.getChids()
chs[chs == 'bb'] = '_bb'
p.setChids(chs)
hm['biomoltrans']['2'][0][1] = '_bb'
bm = prody.buildBiomolecules(hm,m)

[<AtomGroup: 7ZAS biomolecule 1 (6488 atoms)>, <AtomGroup: 7ZAS biomolecule 2 (6269 atoms)>]

I don't think there should be a mismatch in the result of calling parseMMCIF with biomol=True and manually building with buildBiomolecules. The fact that the parser doesn't understand that bb is not the backbone selector in the context of a chain selector is the fundamental bug. If that is too difficult to fix, renaming bb chains to _bb wouldn't be a horrible work around.

The best fix is to fix the parser to understand that in the context of a chain selection, bb does not mean backbone. A simpler fix would be to rename the chain to bb_ or something.

jamesmkrieger · 2023-10-14T15:03:31Z

Ok, yes, it should know that bb is a chain id and not backbone. This would be a general bug and not just something affecting biomol assembly.

dkoes · 2023-10-14T15:23:09Z

Here is a fix for the parser. It is very specific to the chain problem. Really should identify all the high precedance operators that take an arbitrary string and avoid converting the string to a flag for all of them:

--- a/prody/atomic/select.py
+++ b/prody/atomic/select.py
@@ -1341,8 +1341,10 @@ class Select(object):
         isDataLabel = atoms.isDataLabel
         append = None
         wasand = False
+        token = ''
         while tokens:
             # check whether token is an array to avoid array == str comparison
+            last_was_chain = token == 'chain'
             token = tokens.pop(0)
             try:
                 dtype = token.dtype
@@ -1356,7 +1358,7 @@ class Select(object):
                     wasand = True
                     continue
 
-                elif isFlagLabel(token):
+                elif isFlagLabel(token) and not last_was_chain:
                     flags.append(token)
                     append = None

dkoes · 2023-10-14T15:36:55Z

I should point out this is just a partial band-aid fix. Something like "chain A bb" will still break. But I don't want to dive into the parser any more and need to stop spending time on this for now.

jamesmkrieger · 2023-10-14T15:44:48Z

Thanks. I understand

I’ll try and look into it. That’s already a very helpful start

dkoes · 2023-10-14T15:46:25Z

Maybe what is needed it to check if firsttoken is chain? It's not clear to me what the precondition for and2 being called is.

jamesmkrieger · 2023-10-14T16:27:57Z

Maybe what is needed it to check if firsttoken is chain? It's not clear to me what the precondition for and2 being called is.

Me either. I’ll have to look very carefully sometime

jamesmkrieger · 2023-10-14T16:28:52Z

We may also want to look at brackets and see what it does with them and whether we can mimic it

jamesmkrieger · 2023-10-14T16:31:09Z

I suppose we’ll also have problems with chains b, x, y and z that will be fixed by noticing that we have arrays of values for a flag like chain

jamesmkrieger · 2023-10-14T17:46:29Z

Being able to build biomol assemblies afterwards is also a good reason for setting unite chains as False by default, unless we update the header.

It always seems like big structures and mmCIF format cause lots of complications

jamesmkrieger · 2023-10-16T13:10:29Z

ok, now we have a fix for the parser. If there was a data label beforehand and we haven't had an "and" since then, then it doesn't check if it could be a flag. This should help with chain or segment bb and also ca and sc.

jamesmkrieger · 2023-10-16T14:24:14Z

I think everything should be good now

jamesmkrieger · 2023-10-17T07:05:34Z

Thanks

jamesmkrieger added 3 commits October 12, 2023 19:38

fix get unobserved seq

0b5227b

swap asym ids and fix downstream

1005eb5

make unite_chains True

3054d80

jamesmkrieger requested a review from dkoes October 12, 2023 19:14

jamesmkrieger added 11 commits October 12, 2023 21:24

Merge branch 'fix_unobs' into swap_asym_id

ce01519

allow forgotten None case

de67ffc

add segment selection to parse mmcif

074ec84

make parseMMCIF chain sensitive to unite_chains

fb4d2e8

check segment and add it to title_suffix

5c664b3

update mmcif 3o21 into test datafiles

3dccd38

tests for unite chains and biomols

39feb39

make unite_chains True in parseMMCIFStream

709014a

unobs header tests

6713df2

fixes from failed tests

99784a5

hopefully last fixes

28c70fb

jamesmkrieger marked this pull request as draft October 13, 2023 08:59

dkoes approved these changes Oct 13, 2023

View reviewed changes

prody/proteins/ciffile.py Show resolved Hide resolved

dkoes reviewed Oct 14, 2023

View reviewed changes

prody/proteins/header.py Show resolved Hide resolved

jamesmkrieger added 6 commits October 16, 2023 12:35

fix biomols for same chid

692c1a5

fix unite_chains docs to True

43b524f

set unite_chains False and improve docs

4deb25e

unite chains docs

951a126

fix exanm exgnm docs

1e2cb8a

parser fix

8de2c03

jamesmkrieger added 3 commits October 16, 2023 14:13

extend parser fix to unary

14ecff6

fix unite chains test for default False

e6d15d7

fix unite chains biomol test for default False

98cebd0

jamesmkrieger marked this pull request as ready for review October 16, 2023 14:23

dkoes approved these changes Oct 17, 2023

View reviewed changes

jamesmkrieger merged commit a15eaf7 into prody:master Oct 17, 2023
4 checks passed

jamesmkrieger deleted the swap_asym_id branch October 17, 2023 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap asym ids #1773

Swap asym ids #1773

jamesmkrieger commented Oct 12, 2023 •

edited

Loading

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 13, 2023

dkoes left a comment

jamesmkrieger commented Oct 13, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

dkoes commented Oct 14, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 16, 2023 •

edited

Loading

jamesmkrieger commented Oct 16, 2023

jamesmkrieger commented Oct 17, 2023

Swap asym ids #1773

Swap asym ids #1773

Conversation

jamesmkrieger commented Oct 12, 2023 • edited Loading

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 12, 2023

jamesmkrieger commented Oct 13, 2023

dkoes left a comment

Choose a reason for hiding this comment

jamesmkrieger commented Oct 13, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

dkoes commented Oct 14, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

dkoes commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 14, 2023

jamesmkrieger commented Oct 16, 2023 • edited Loading

jamesmkrieger commented Oct 16, 2023

jamesmkrieger commented Oct 17, 2023

jamesmkrieger commented Oct 12, 2023 •

edited

Loading

jamesmkrieger commented Oct 16, 2023 •

edited

Loading