-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap asym ids #1773
Swap asym ids #1773
Conversation
Also needs tests 3o21 seems like a good system for this as it has 2 clear biomols (dimers) and is generally not that big |
@dkoes, any comments on this or do you think it's ok? |
Actually we need to update segments when there are duplicates. 3h5v is a good example of this with chain C getting copied in a dimer and tetramer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really am not familiar enough with CIF to understand what is going on or why it fixes 2VVJ.. but if it does, great!
Ok. Thanks |
Okay, I'm still trying to figure out what is going on with the code. The overall issue is some PDBs have an "auth" chain id of "bb" (specifically, these PDBs: 7M57, 7VRT, 7ZAS, 7ATA, 5EXC, 5BKN, 7M2T, 5BKL, 7VS5, 7M3T, 8H2I, 7ANM,7M50). Taking 7ZAS as an example (because it is relatively small), if I do p,h = prody.parsePDB('7ZAS',header=True,biomol=True) I get: These are the correct bioassemblys (I wrote them out and compared to what is in PDB).
However, the AtomGroups still have the auth chain ids: set(p[0].getChids()),set(p[1].getChids())
I have yet to figure out how the bioassemblies were correctly created. If you build the bioassemblies manually: p,h = prody.parseMMCIF('7ZAS',header=True)
bm = prody.buildBiomolecules(h,p) You get something different: If you rename the bb chain you can also get the correct assemblies: m,hm = prody.parseMMTF('7ZAS',header=True)
chs = p.getChids()
chs[chs == 'bb'] = '_bb'
p.setChids(chs)
hm['biomoltrans']['2'][0][1] = '_bb'
bm = prody.buildBiomolecules(hm,m)
I don't think there should be a mismatch in the result of calling parseMMCIF with biomol=True and manually building with buildBiomolecules. The fact that the parser doesn't understand that bb is not the backbone selector in the context of a chain selector is the fundamental bug. If that is too difficult to fix, renaming bb chains to _bb wouldn't be a horrible work around. The best fix is to fix the parser to understand that in the context of a chain selection, bb does not mean backbone. A simpler fix would be to rename the chain to bb_ or something. |
Ok, yes, it should know that bb is a chain id and not backbone. This would be a general bug and not just something affecting biomol assembly. |
Here is a fix for the parser. It is very specific to the chain problem. Really should identify all the high precedance operators that take an arbitrary string and avoid converting the string to a flag for all of them:
|
I should point out this is just a partial band-aid fix. Something like "chain A bb" will still break. But I don't want to dive into the parser any more and need to stop spending time on this for now. |
Thanks. I understand I’ll try and look into it. That’s already a very helpful start |
Maybe what is needed it to check if |
Me either. I’ll have to look very carefully sometime |
We may also want to look at brackets and see what it does with them and whether we can mimic it |
I suppose we’ll also have problems with chains b, x, y and z that will be fixed by noticing that we have arrays of values for a flag like chain |
Being able to build biomol assemblies afterwards is also a good reason for setting unite chains as False by default, unless we update the header. It always seems like big structures and mmCIF format cause lots of complications |
ok, now we have a fix for the parser. If there was a data label beforehand and we haven't had an "and" since then, then it doesn't check if it could be a flag. This should help with chain or segment bb and also ca and sc. |
I think everything should be good now |
Thanks |
Response to #1772
Other fixes include: