-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmasked zeroed tertiary data in text-based CASP7 #30
Comments
I believe there are a handful of structures that only contain alpha-carbon information. If you inspect the RCSB entry, you'll find this is the case for this structure. You can also see the pattern of (N, Calpha, C) in the tertiary data, where N and C are missing. Hopefully Mohammed can correct me if I am mistaken, but I hope my comment can help for now. |
I see! So sometimes individual atoms can be missing in spite of a "+" mask. But can we assume that each (0, 0, 0) atom is in fact just missing data? |
Correct. I believe the mask is on the residue level and not the atom level.
Yes, I would think it is reasonable to assume that and it is most likely described somewhere in the documentation here.
…On Mar 18, 2022, 2:02 PM -0700, memoryleak47 ***@***.***>, wrote:
I see! So sometimes individual atoms can be missing in spite of a "+" mask.
But can we assume that each (0, 0, 0) atom is in fact just missing data?
Or maybe is there some link where I could read those details up?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Ah, true! If I'm not overlooking something, it doesn't seem to be mentioned in the documentation here https://github.com/aqlaboratory/proteinnet/blob/master/docs/proteinnet_records.md nor anywhere else on this github page. Is there some external resource where I could read that up? |
I'm afraid I don't have more information. I'm not affiliated with ProteinNet, though I use the provided data and dataset splits in my own research. |
When implementing an RGN for a university project, we stumbled upon a few apparant irregularities in the text-based CASP7 dataset provided here.
That is, quite a few atoms in the tertiary data were positioned at (0,0,0) even though the mask was +, i.e. the atom was considered to be 'valid'.
Example taken from CASP7/validation.
In this example two thirds of the atoms are positioned at (0, 0, 0).
Is this a bug, or am I simply misinterpreting the given data somehow?
Thanks in advance!
The text was updated successfully, but these errors were encountered: