Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CONECT lines in PDB file (load and dump bonds) #250

Merged
merged 5 commits into from
Apr 29, 2021

Conversation

tovrstra
Copy link
Member

This adds support for the CONECT lines in pdb load and dump functions. I also split up the load_one functions over a few helpers to stop pylint from complaining about too long functions.

@tovrstra tovrstra requested a review from FarnazH March 30, 2021 10:43
@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #250 (54b4127) into master (d5523d1) will increase coverage by 0.67%.
The diff coverage is 98.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #250      +/-   ##
==========================================
+ Coverage   94.99%   95.66%   +0.67%     
==========================================
  Files          73       74       +1     
  Lines        7669     8310     +641     
  Branches     1020     1092      +72     
==========================================
+ Hits         7285     7950     +665     
+ Misses        183      164      -19     
+ Partials      201      196       -5     
Impacted Files Coverage Δ
iodata/formats/pdb.py 99.30% <98.71%> (-0.70%) ⬇️
iodata/test/test_pdb.py 98.47% <100.00%> (+0.08%) ⬆️
iodata/formats/sdf.py 98.50% <0.00%> (-1.50%) ⬇️
iodata/basis.py 100.00% <0.00%> (ø)
iodata/periodic.py 100.00% <0.00%> (ø)
iodata/test/test_sdf.py 100.00% <0.00%> (ø)
iodata/formats/cp2klog.py 97.54% <0.00%> (ø)
iodata/test/test_qchemlog.py 100.00% <0.00%> (ø)
iodata/test/test_orbitals.py 100.00% <0.00%> (ø)
iodata/utils.py 87.95% <0.00%> (+0.14%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5523d1...54b4127. Read the comment docs.

@tovrstra tovrstra changed the title Pdb bonds Add support for CONECT lines in PDB file (load and dump bonds) Mar 30, 2021
@tovrstra tovrstra requested a review from evohringer April 1, 2021 18:06
Copy link
Member

@FarnazH FarnazH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the upgrade.

Comment on lines 81 to 130
# If the PDB file has a title replace it.
# If the PDB file has a title, replace the default.
if line.startswith("TITLE") or line.startswith("COMPND"):
title = line[10:].rstrip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the PDB has a TITLE or COMPND section, they are usually split over multiple lines, and I think the current code only keeps the last line. So, shouldn't we use sth like: title += line[10:].rstrip()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would also be good to add a test for title split over multiple lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed be good. We may still include the newline characters, such that the title can be written out over multiple lines as well.

molecule_found = True
if line.startswith("CONECT"):
for iatom0, iatom1 in _parse_pdb_conect_line(line):
bonds.append([iatom0, iatom1, 1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify what 1 means here? Is it meant to be bond-order? If so, what do you think of using None because it is not reported in PDB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.

Copy link
Member Author

@tovrstra tovrstra Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree no bond type should be specified but with the current API, an integer must be provided, setting to None is not an option. The integers are "bond types" as defined in iodata.periodic. Setting it to bond2num["un"] (unknown) could be a good temporary solution. My intention is to address this more generally with #222 (to introduce optional bond types and bond orders). With this PR and #248 I mainly wanted to get a better understanding of the different ways bonds are stored in various file formats. When addressing #222, this code will be updated.

Copy link
Contributor

@evohringer evohringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only some minor issues. A test for the title over multiple lines would be nice I think. Everything else is great and improved it a lot!

molecule_found = True
if line.startswith("CONECT"):
for iatom0, iatom1 in _parse_pdb_conect_line(line):
bonds.append([iatom0, iatom1, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.

# Prepare for CONECT lines.
connections = [[] for iatom in range(data.natom)]
if data.bonds is not None:
for iatom0, iatom1 in data.bonds[:, :2]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create the list connections and loop over connections only if data.bonds is not None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

print("CONECT{:5d}{}".format(
iatom0 + 1, "".join(
"{:5d}".format(iatom1 + 1)
for iatom1 in iatoms1[ichunk * 4:ichunk * 4 + 4])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would increase the readability of the conde defining the string first of the connected atoms (iatom1) and then concatenate the string "CONECT" and iatom0 string with the newly created string. It took me a some time to understand what is written.

Comment on lines 81 to 130
# If the PDB file has a title replace it.
# If the PDB file has a title, replace the default.
if line.startswith("TITLE") or line.startswith("COMPND"):
title = line[10:].rstrip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would also be good to add a test for title split over multiple lines.

@tovrstra
Copy link
Member Author

@FarnazH All should be fixed. We may have a caching issue in the CI (for ubuntu-latest, 3.9), which is fixed in the master branch. There was no nice way to split load_one into small chunks (too many return values), so I've added pylint exceptions.

Copy link
Contributor

@evohringer evohringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine! Great!

@tovrstra
Copy link
Member Author

Thanks for reviewing!

@tovrstra tovrstra merged commit d7eccdf into theochem:master Apr 29, 2021
@tovrstra tovrstra deleted the pdb-bonds branch April 29, 2021 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants