Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WatFinder - findClusterCenters() extension #1967

Merged
merged 7 commits into from
Oct 30, 2024

Conversation

karolamik13
Copy link
Contributor

I modified findClusterCenters() to include other selections, not only water.

@karolamik13
Copy link
Contributor Author

Before you ask James, I included two types of ifs (separate one for water) because I wanted to be sure that the whole water molecule will be considered in exwithin and not only oxygen, and that would happen if the selection would be not water.

@jamesmkrieger
Copy link
Contributor

This doesn't work if the selection has water and something else in it. We should either make a loop for that or raise an error if people try it

@jamesmkrieger
Copy link
Contributor

Why is HOH added to water selection? Probably it's better to check if any of the types of water are in there

@jamesmkrieger
Copy link
Contributor

Why is HOH added to water selection? Probably it's better to check if any of the types of water are in there

done

@jamesmkrieger
Copy link
Contributor

I'm also a bit confused about why we would want to include the whole water if someone specifically puts in a selection that picks specific atoms. Surely it's better to keep using the provided selection and just update the default one?

@karolamik13
Copy link
Contributor Author

HOH is added to the selection because when you download PDBs from Protein Data Bank, and you analyze multiple structures as we did for protein kinase A, you need to change the selection to resname HOH and name O2. We have that in WatFinder paper in Jupyter Notebooks. So, there is no "water" in that selection, and we still want to treat HOH as water because it is a typical PDB name. In the code where we do not have "water" in the selection, it will take your selection so it will check resname HOH and name O2 without hydrogens, and then the distC will be different than we had before (for WatFinder paper).

@karolamik13
Copy link
Contributor Author

Why is HOH added to water selection? Probably it's better to check if any of the types of water are in there

done

Definitely. Thank you

@jamesmkrieger
Copy link
Contributor

HOH is added to the selection because when you download PDBs from Protein Data Bank, and you analyze multiple structures as we did for protein kinase A, you need to change the selection to resname HOH and name O2. We have that in WatFinder paper in Jupyter Notebooks. So, there is no "water" in that selection, and we still want to treat HOH as water because it is a typical PDB name. In the code where we do not have "water" in the selection, it will take your selection so it will check resname HOH and name O2 without hydrogens, and then the distC will be different than we had before (for WatFinder paper).

I still don't understand this. HOH should be recognised as water and we should be able to put "water" in the selection string to catch it.

I also don't get why we use O2 or OH2 and then include the whole water molecule. It still feels like something that we should control at the level of the input selection and not through the internal code because it could give users unexpected results.

@karolamik13
Copy link
Contributor Author

HOH is added to the selection because when you download PDBs from Protein Data Bank, and you analyze multiple structures as we did for protein kinase A, you need to change the selection to resname HOH and name O2. We have that in WatFinder paper in Jupyter Notebooks. So, there is no "water" in that selection, and we still want to treat HOH as water because it is a typical PDB name. In the code where we do not have "water" in the selection, it will take your selection so it will check resname HOH and name O2 without hydrogens, and then the distC will be different than we had before (for WatFinder paper).

I still don't understand this. HOH should be recognized as water and we should be able to put "water" in the selection string to catch it.

I also don't get why we use O2 or OH2 and then include the whole water molecule. It still feels like something that we should control at the level of the input selection and not through the internal code because it could give users unexpected results.

You are right, and we could drop that one if, but then we will have slightly different results than what we had in the WatFinder paper comparison. The results would be different because distC should be bigger now compared to the previous analysis.

In the code, we select O2, and we check its location and whether some other water molecules are close. We are also including hydrogens (that is why water and not the selection)[code below]. If we use the selection 'resname HOH', ProDy will include all three atoms (H,O, H) and compute the center of the mass, and that localization will be checked. It is fine as well, but it will give slightly different results than what we have in the WatFinder tutorial, and that may confuse people.

sel = coords_all.select(str(selection)+' within '+str(distC)+' of center',
center=coords_all.getCoords()[ii])

@jamesmkrieger
Copy link
Contributor

But the default selection is factoring that in and any user selection can too

selection = kwargs.pop('selection', 'water and name OH2')

@jamesmkrieger
Copy link
Contributor

We should probably use element O or something like that to be more general as different software use different atom names

@karolamik13
Copy link
Contributor Author

karolamik13 commented Oct 7, 2024

We should probably use element O or something like that to be more general as different software use different atom names

We might want to use 'name "O.*"'
I see that p.select('resname "A.*"') is possible although I never used it (http://www.bahargroup.org/prody/manual/reference/atomic/select.html?highlight=select#module-prody.atomic.select) so that will work for any oxygen name.

@jamesmkrieger
Copy link
Contributor

Sounds reasonable in case the element field isn’t set properly. We should check if it always starts with O but I think so

@karolamik13
Copy link
Contributor Author

I have changed the oxygen selection - it is working fine with our previous examples -> MD trajectory from NAMD and PDBs analysis of aurora kinase A (both examples from WatFinder paper).

@jamesmkrieger
Copy link
Contributor

Great

@jamesmkrieger
Copy link
Contributor

I’ll still want to go through the whole code of the function and check where the oxygen selection is used

Copy link
Contributor

@jamesmkrieger jamesmkrieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think it works now

@jamesmkrieger jamesmkrieger merged commit c49a2b2 into prody:main Oct 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants