Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal go contact map #643

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

internal go contact map #643

wants to merge 30 commits into from

Conversation

csbrasnett
Copy link
Collaborator

Calculate the contact map for the go model without an externally generated input file.

CLI changes:
-go now accepts either a file in the standard form, or just -go to calculate internally.

implementation of the contact map:

  • I have tried to do some smart numpy things to speed up the calculation, but I imagine there is still room for improvement.
  • Unfortunately I don't think using KDTree for distance cutoffs is possible, because there isn't actually a single cutoff distance, but one that varies atom-atom depending on atomtype. Otherwise I imagine we could speed things up that way.
  • This implementation is actually more robust cf the server, because we have a consistent set of atom names we use to calculate the contacts. The server uses default atom names expected from pdbs, which breaks down when people e.g. use AA sim files as inputs, because then these get changed.

csbrasnett and others added 6 commits November 21, 2024 11:34
- local generation of contact map guaranteed to reproduce one from server
- -go argument now accepts either path for file from server, or no argument to use the native implementation
- sometimes vermouth reads in atoms/residues in a strange order. This is seems to go especially weird for heteromers for some reason that I can't figure out. So, we sort out how we deal with reading the system by using a residue graph and making sure we read each residue node in order
- Also deal with missing OXT atoms if we have a contact with a modified atom. This is a definite source of error in the implementation in the server, which strictly assumes canonical pdb atom names and order
- NB: above error also causes issues for contact map files when the CA atom is not the first one listed. This doesn't actually affect us, because 1) we note CA directly, 2) it's not actually needed for the contact map in the end.
- fix up some other imports/linting
Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review more thoroughly once v2 is there ;)
Love the feature

bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
Comment on lines 425 to 433
for node in G.nodes:
# we only need these for writing at the end
resnames.append(G.nodes[node]['resname'])
resids.append(G.nodes[node]['resid'])
chains.append(G.nodes[node]['chain'])
nodes.append(G.nodes[node]['_res_serial'])

res_pos = []
for subnode in sorted(G.nodes[node]['graph'].nodes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not sorting, and just building a dict of graph_idx: array_idx

vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
return cogs, vdw_list, atypes, coords, res_serial, resids, chains, resnames, nodes, ca_pos, nresidues, G


def calculate_contact_map(cogs, vdw_list, atypes, coords, res_serial,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's starting to shape up very nicely! Awesome work so far.
For the next pass, besides the comments, could you also have a look at the functions you create, and decide whether those will have a use beyond the go map? If not, mark them as private (prefix a _).
Some functions also need a lot of arguments, and return a lot of different things. Do you think there's any chance we could simplify that?

vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
@csbrasnett
Copy link
Collaborator Author

That should be all the more minor points addressed, the main one being how to improve the atom2res function. (had to push this now because my laptop keeps losing the .git in my clone somehow?! feel free to ignore until I've addressed everything)

We have to use itertools.product rather than itertools.combinations because we can't assume that the array is symmetric. But the speedup is sufficient that this is insignificant in comparison.

I've had a look at using dictionaries instead for the atomic resolution arrays. I'm still not sure it's possible/advisable with the way I've currently written it? Or at least, I'd rather get all the other jobs above sorted and revisit that later if we think there might be substantial advantage. I agree, the arrays are pretty sparse - but they aren't completely, which is slightly annoying.

changed file writer to deferred file writer

added cli to write file if desired

changed function names for internal use

made bond_type a global variable and removed function

corrected error in sphere generation

moved system merging to martinize2
@csbrasnett
Copy link
Collaborator Author

csbrasnett commented Dec 20, 2024

I think that's as much as we need for now. some other things to deal with:

  1. Write a header for the contact_map_vermouth.out file
  2. deal with Expose Go model backbone and atomname in add_virtual_sites #585
  3. deal with handle missing chain indicators in go model better #642
  4. (and if we want to before completing this implementation) change the contact arrays to dictionaries for further speed improvement

Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More small comments :)

bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
vermouth/rcsu/contact_map.py Outdated Show resolved Hide resolved
…nctions. added more information to test_molecule.

- tests added for _make_surface, atom2res, make_atom_map (new function from reformatting), _contact_info
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants