-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CONECT
lines in PDB file (load and dump bonds)
#250
Conversation
Codecov Report
@@ Coverage Diff @@
## master #250 +/- ##
==========================================
+ Coverage 94.99% 95.66% +0.67%
==========================================
Files 73 74 +1
Lines 7669 8310 +641
Branches 1020 1092 +72
==========================================
+ Hits 7285 7950 +665
+ Misses 183 164 -19
+ Partials 201 196 -5
Continue to review full report at Codecov.
|
CONECT
lines in PDB file (load and dump bonds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the upgrade.
iodata/formats/pdb.py
Outdated
# If the PDB file has a title replace it. | ||
# If the PDB file has a title, replace the default. | ||
if line.startswith("TITLE") or line.startswith("COMPND"): | ||
title = line[10:].rstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the PDB has a TITLE
or COMPND
section, they are usually split over multiple lines, and I think the current code only keeps the last line. So, shouldn't we use sth like: title += line[10:].rstrip()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would also be good to add a test for title split over multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed be good. We may still include the newline characters, such that the title can be written out over multiple lines as well.
iodata/formats/pdb.py
Outdated
molecule_found = True | ||
if line.startswith("CONECT"): | ||
for iatom0, iatom1 in _parse_pdb_conect_line(line): | ||
bonds.append([iatom0, iatom1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify what 1
means here? Is it meant to be bond-order? If so, what do you think of using None
because it is not reported in PDB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree no bond type should be specified but with the current API, an integer must be provided, setting to None
is not an option. The integers are "bond types" as defined in iodata.periodic
. Setting it to bond2num["un"]
(unknown) could be a good temporary solution. My intention is to address this more generally with #222 (to introduce optional bond types and bond orders). With this PR and #248 I mainly wanted to get a better understanding of the different ways bonds are stored in various file formats. When addressing #222, this code will be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only some minor issues. A test for the title over multiple lines would be nice I think. Everything else is great and improved it a lot!
iodata/formats/pdb.py
Outdated
molecule_found = True | ||
if line.startswith("CONECT"): | ||
for iatom0, iatom1 in _parse_pdb_conect_line(line): | ||
bonds.append([iatom0, iatom1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.
# Prepare for CONECT lines. | ||
connections = [[] for iatom in range(data.natom)] | ||
if data.bonds is not None: | ||
for iatom0, iatom1 in data.bonds[:, :2]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create the list connections and loop over connections only if data.bonds is not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
print("CONECT{:5d}{}".format( | ||
iatom0 + 1, "".join( | ||
"{:5d}".format(iatom1 + 1) | ||
for iatom1 in iatoms1[ichunk * 4:ichunk * 4 + 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would increase the readability of the conde defining the string first of the connected atoms (iatom1) and then concatenate the string "CONECT" and iatom0 string with the newly created string. It took me a some time to understand what is written.
iodata/formats/pdb.py
Outdated
# If the PDB file has a title replace it. | ||
# If the PDB file has a title, replace the default. | ||
if line.startswith("TITLE") or line.startswith("COMPND"): | ||
title = line[10:].rstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would also be good to add a test for title split over multiple lines.
@FarnazH All should be fixed. We may have a caching issue in the CI (for ubuntu-latest, 3.9), which is fixed in the master branch. There was no nice way to split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine! Great!
Thanks for reviewing! |
This adds support for the
CONECT
lines in pdb load and dump functions. I also split up the load_one functions over a few helpers to stop pylint from complaining about too long functions.