Add support for `CONECT` lines in PDB file (load and dump bonds) #250

tovrstra · 2021-03-30T10:43:21Z

This adds support for the CONECT lines in pdb load and dump functions. I also split up the load_one functions over a few helpers to stop pylint from complaining about too long functions.

codecov · 2021-03-30T10:51:32Z

Codecov Report

Merging #250 (54b4127) into master (d5523d1) will increase coverage by 0.67%.
The diff coverage is 98.92%.

@@            Coverage Diff             @@
##           master     #250      +/-   ##
==========================================
+ Coverage   94.99%   95.66%   +0.67%     
==========================================
  Files          73       74       +1     
  Lines        7669     8310     +641     
  Branches     1020     1092      +72     
==========================================
+ Hits         7285     7950     +665     
+ Misses        183      164      -19     
+ Partials      201      196       -5

Impacted Files	Coverage Δ
iodata/formats/pdb.py	`99.30% <98.71%> (-0.70%)`	⬇️
iodata/test/test_pdb.py	`98.47% <100.00%> (+0.08%)`	⬆️
iodata/formats/sdf.py	`98.50% <0.00%> (-1.50%)`	⬇️
iodata/basis.py	`100.00% <0.00%> (ø)`
iodata/periodic.py	`100.00% <0.00%> (ø)`
iodata/test/test_sdf.py	`100.00% <0.00%> (ø)`
iodata/formats/cp2klog.py	`97.54% <0.00%> (ø)`
iodata/test/test_qchemlog.py	`100.00% <0.00%> (ø)`
iodata/test/test_orbitals.py	`100.00% <0.00%> (ø)`
iodata/utils.py	`87.95% <0.00%> (+0.14%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5523d1...54b4127. Read the comment docs.

FarnazH

Thanks for the upgrade.

FarnazH · 2021-04-06T12:13:27Z

iodata/formats/pdb.py

- # If the PDB file has a title replace it.
+ # If the PDB file has a title, replace the default.
 if line.startswith("TITLE") or line.startswith("COMPND"):
 title = line[10:].rstrip()


If the PDB has a TITLE or COMPND section, they are usually split over multiple lines, and I think the current code only keeps the last line. So, shouldn't we use sth like: title += line[10:].rstrip()?

Maybe it would also be good to add a test for title split over multiple lines.

That would indeed be good. We may still include the newline characters, such that the title can be written out over multiple lines as well.

FarnazH · 2021-04-06T12:16:54Z

iodata/formats/pdb.py

 molecule_found = True
+ if line.startswith("CONECT"):
+ for iatom0, iatom1 in _parse_pdb_conect_line(line):
+ bonds.append([iatom0, iatom1, 1])


Can you please clarify what 1 means here? Is it meant to be bond-order? If so, what do you think of using None because it is not reported in PDB?

I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.

I agree no bond type should be specified but with the current API, an integer must be provided, setting to None is not an option. The integers are "bond types" as defined in iodata.periodic. Setting it to bond2num["un"] (unknown) could be a good temporary solution. My intention is to address this more generally with #222 (to introduce optional bond types and bond orders). With this PR and #248 I mainly wanted to get a better understanding of the different ways bonds are stored in various file formats. When addressing #222, this code will be updated.

evohringer

I have only some minor issues. A test for the title over multiple lines would be nice I think. Everything else is great and improved it a lot!

evohringer · 2021-04-06T16:44:44Z

iodata/formats/pdb.py

 molecule_found = True
+ if line.startswith("CONECT"):
+ for iatom0, iatom1 in _parse_pdb_conect_line(line):
+ bonds.append([iatom0, iatom1, 1])


I agree with Farnaz it is better to leave "None" since the bond type is not provided in pdf format. Setting all to "1" would assign a single bond which might not be correct.

evohringer · 2021-04-06T16:48:12Z

iodata/formats/pdb.py

+ # Prepare for CONECT lines.
+ connections = [[] for iatom in range(data.natom)]
+ if data.bonds is not None:
+ for iatom0, iatom1 in data.bonds[:, :2]:


I would create the list connections and loop over connections only if data.bonds is not None.

Good point!

evohringer · 2021-04-06T16:54:31Z

iodata/formats/pdb.py

+ print("CONECT{:5d}{}".format(
+ iatom0 + 1, "".join(
+ "{:5d}".format(iatom1 + 1)
+ for iatom1 in iatoms1[ichunk * 4:ichunk * 4 + 4])


It would increase the readability of the conde defining the string first of the connected atoms (iatom1) and then concatenate the string "CONECT" and iatom0 string with the newly created string. It took me a some time to understand what is written.

evohringer · 2021-04-06T16:58:10Z

iodata/formats/pdb.py

- # If the PDB file has a title replace it.
+ # If the PDB file has a title, replace the default.
 if line.startswith("TITLE") or line.startswith("COMPND"):
 title = line[10:].rstrip()


Maybe it would also be good to add a test for title split over multiple lines.

tovrstra · 2021-04-14T11:10:50Z

@FarnazH All should be fixed. We may have a caching issue in the CI (for ubuntu-latest, 3.9), which is fixed in the master branch. There was no nice way to split load_one into small chunks (too many return values), so I've added pylint exceptions.

evohringer

Looks fine! Great!

tovrstra · 2021-04-29T06:47:29Z

Thanks for reviewing!

tovrstra added 2 commits March 30, 2021 12:39

PDB format: Load and dump CONECT lines

166817d

Use more pytest.mark.parametrize

c85e546

tovrstra requested a review from FarnazH March 30, 2021 10:43

tovrstra changed the title ~~Pdb bonds~~ Add support for CONECT lines in PDB file (load and dump bonds) Mar 30, 2021

This was referenced Mar 30, 2021

Avoid 'is False' #251

Merged

Add support for fractional bond orders #222

Open

tovrstra requested a review from evohringer April 1, 2021 18:06

FarnazH requested changes Apr 6, 2021

View reviewed changes

evohringer suggested changes Apr 6, 2021

View reviewed changes

tovrstra added 3 commits April 14, 2021 11:55

Change bond type 1 to unknown in pdb

9a66652

Fix few issues in CONECT dumping

2a2717b

Support multiline TITLE and improve handling of COMPND

54b4127

tovrstra requested review from FarnazH and evohringer April 14, 2021 11:10

evohringer approved these changes Apr 14, 2021

View reviewed changes

tovrstra merged commit d7eccdf into theochem:master Apr 29, 2021

tovrstra deleted the pdb-bonds branch April 29, 2021 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `CONECT` lines in PDB file (load and dump bonds) #250

Add support for `CONECT` lines in PDB file (load and dump bonds) #250

tovrstra commented Mar 30, 2021

codecov bot commented Mar 30, 2021 •

edited

Loading

FarnazH left a comment

FarnazH Apr 6, 2021

evohringer Apr 6, 2021

tovrstra Apr 14, 2021

FarnazH Apr 6, 2021

evohringer Apr 6, 2021

tovrstra Apr 14, 2021 •

edited

Loading

evohringer left a comment

evohringer Apr 6, 2021

evohringer Apr 6, 2021

tovrstra Apr 14, 2021

evohringer Apr 6, 2021

evohringer Apr 6, 2021

tovrstra commented Apr 14, 2021

evohringer left a comment

tovrstra commented Apr 29, 2021

Add support for CONECT lines in PDB file (load and dump bonds) #250

Add support for CONECT lines in PDB file (load and dump bonds) #250

Conversation

tovrstra commented Mar 30, 2021

codecov bot commented Mar 30, 2021 • edited Loading

Codecov Report

FarnazH left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovrstra Apr 14, 2021 • edited Loading

Choose a reason for hiding this comment

evohringer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovrstra commented Apr 14, 2021

evohringer left a comment

Choose a reason for hiding this comment

tovrstra commented Apr 29, 2021

Add support for `CONECT` lines in PDB file (load and dump bonds) #250

Add support for `CONECT` lines in PDB file (load and dump bonds) #250

codecov bot commented Mar 30, 2021 •

edited

Loading

tovrstra Apr 14, 2021 •

edited

Loading