Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Waterbridge interactions #219

Open
talagayev opened this issue Sep 30, 2024 · 3 comments
Open

Addition of Waterbridge interactions #219

talagayev opened this issue Sep 30, 2024 · 3 comments

Comments

@talagayev
Copy link
Collaborator

Waterbridges would be an benefitial interaction addition to ProLIF :)

From the first glance we could use class HBAcceptor as an template for creating the interaction recognition. The idea that I would have would be to calculate the distances and angles from the ligand to the water, which we can also specify via SMARTS patterns and then calculate from that water the distance/angle to the protein.

For the SMARTS patterns we can check if we want to differentiate between the Ligand being an Waterbridge Donor/Acceptor and from the patterns I would assume the Waterbridge patterns would be a combination of Acceptor/Anion for one and Donor/Cation for the other.

@cbouy
Copy link
Member

cbouy commented Oct 1, 2024

Currently ProLIF expects all interactions to be between 2 residues (one on the "ligand" side and one of the "protein" side). Because this bridged interaction would require a 3rd residue to work (a water "residue"), there's a bit of an incompatibility with the above expectation.

Before discussing the implementation of the interaction, I'm trying to think of the best way to circumvent this problem without refactoring the entire library.

One way could be to pass a water AtomGroup selection as a parameter of the WaterBridge interaction (same way one can specify the distance threshold, angles... etc.). We could then directly convert the AtomGroup to a Molecule within the interaction's detect method, but this would only work when running the code in serial.
In parallel, things get a bit more complicated as there are multiple copies of the original Universe each parsing different frames of the trajectory but using the same Fingerprint object (and thus the same WaterBridge interaction object). So we'd need to make sure that each process gets a copy of the Fingerprint instead (and thus copies of the WaterBridge object should operate interdependently on different frames although I'm not 100% sure this would work). Worst case we can always force "bridged" interactions to run separately from the rest in serial.
There would be a lot of redundant computation for converting the AtomGroup to a Molecule for each pair of ligand-protein residues though but I don't really see an alternative way.

There's also the fact that we have to deal with both MD trajectories and PDB/docking inputs, and for the latter we'd probably need a way to "extract" the water molecules from the protein structure since this is more convenient than requiring a separate Molecule object/file.

@talagayev
Copy link
Collaborator Author

Currently ProLIF expects all interactions to be between 2 residues (one on the "ligand" side and one of the "protein" side). Because this bridged interaction would require a 3rd residue to work (a water "residue"), there's a bit of an incompatibility with the above expectation.

Before discussing the implementation of the interaction, I'm trying to think of the best way to circumvent this problem without refactoring the entire library.

One way could be to pass a water AtomGroup selection as a parameter of the WaterBridge interaction (same way one can specify the distance threshold, angles... etc.). We could then directly convert the AtomGroup to a Molecule within the interaction's detect method, but this would only work when running the code in serial. In parallel, things get a bit more complicated as there are multiple copies of the original Universe each parsing different frames of the trajectory but using the same Fingerprint object (and thus the same WaterBridge interaction object). So we'd need to make sure that each process gets a copy of the Fingerprint instead (and thus copies of the WaterBridge object should operate interdependently on different frames although I'm not 100% sure this would work). Worst case we can always force "bridged" interactions to run separately from the rest in serial. There would be a lot of redundant computation for converting the AtomGroup to a Molecule for each pair of ligand-protein residues though but I don't really see an alternative way.

There's also the fact that we have to deal with both MD trajectories and PDB/docking inputs, and for the latter we'd probably need a way to "extract" the water molecules from the protein structure since this is more convenient than requiring a separate Molecule object/file.

Hm sounds complicated 🤔 for the inteeraction between two residues restriction, would it be possible to go around it through looking separatly at the interactions between the ligand and water and then water and residue separatly and if it is possible to pinpoint the exact water molecule via the AtomGroup selection, would it work then (I think you also mention it when you suggest the passing of water as the AtomGroup selection 🤔) ?

So in parallel with the case that it uses the original Universe, would it still cover the water bridges of waters that appear later in the binding site (a water molecule enters the binding site after half of the simulation is done and then creates the water bridge) or would it then recognize mainly the inital waters that surround the molecule during the original Universe.

Hm, it should be doable to extract from PDB/docking inputs, probably need to check that we cover all of the water name options that would be in the PDB or other formats if you want to directly get them from there :)

@cbouy
Copy link
Member

cbouy commented Oct 2, 2024

would it be possible to go around it through looking separatly at the interactions between the ligand and water and then water and residue separatly

So doing it in 2 times as you suggest is kind of a workaround that would work for now I guess, something like

fp1 = plf.Fingerprint(["WaterBridge"]).run(u.trajectory, ligand_selection, water_selection)
fp2 = plf.Fingerprint(["WaterBridge"]).run(u.trajectory, protein_selection, water_selection)

And then you'd do an inner-join of the resulting fp1.ifp and fp2.ifp where the last residue of the pair is the same.
We could then wrap both steps (calculating the 2 FPs and merging them) into something like an fp.add_bridged_interactions(water_selection) method. I think this is probably the best way to do it actually, the other solution has a lot of uncertainties and complexity to it.

So in parallel with the case that it uses the original Universe, would it still cover the water bridges of waters that appear later in the binding site (a water molecule enters the binding site after half of the simulation is done and then creates the water bridge) or would it then recognize mainly the inital waters that surround the molecule during the original Universe.

That's the thing I'm worried about essentially if we go for the solution I proposed in my first answer.

Hm, it should be doable to extract from PDB/docking inputs, probably need to check that we cover all of the water name options that would be in the PDB or other formats if you want to directly get them from there :)

I wouldn't even worry about covering all the water names that much, it can always be a parameter that users change.

Something else to tackle (yay) would be visualisation, especially the 3D one. But we can do this incrementally, no need to tackle all in one PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants