-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SignInt & FTP fix #1989
SignInt & FTP fix #1989
Conversation
"""Remove empty strings from a list.""" | ||
return [elem for elem in row if elem != ''] | ||
|
||
def log_message(message, level="INFO"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have lots of prody things to do this functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one would you suggest? I used here LOGGER.info (below). Should I just use any particular function?
def log_message(message, level="INFO"):
"""Log a message with a specified log level."""
LOGGER.info("[{}] {}".format(level, message))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that doesn’t work but we’ll need to check
|
||
# Run foldseek and other commands | ||
subprocess.run([ | ||
'foldseek', 'easy-search', 'inp.pdb', database_folder, 'prot.foldseek', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check that that the database_folder exists before running foldseek and having it fail if it doesn't
Also, how are we handling errors from all these subprocess commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that, but I don't know which problems we might have with the subprocess. Maybe Anupam can say something more about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmkrieger and @karolamik13 I just discussed with Anupam about this PR. I will test and make sure that a few checks are in place for these subprocesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anthony!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Great. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmkrieger and @karolamik13 I have tested this with ADK and everything is working as expected (except one small thing with the plotting, which I am debugging now). I added in some checks to make sure the subprocesses are covered. Once I am done, I will open a PR for your branch, Karolina and you can have a look at my changes. There was a bug I found and fixes, so that change is necessary to merge in, the other checks are helpful but not essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I made a typo in the commits. It should be: 3to1 letter code is now replaced by AAMAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn’t seem to be any return line?
Are you referring to merging the main ProDy branch? I wanted to update it to be sure that everything is working fine. |
I meant the runDali commit |
Ok. In such a case, I don't understand :) I added runDali here in this pull request and not in a separate branch because, we have runFoldseek here as well, and I will have to change the calcSignatureInteractions to include Dali. |
I meant there’s no object returned by runDali, but I guess that’s ok as it produces files I’d also suggest calling these functions something more specific, especially for Dali where we already have classes and functions already |
@jamesmkrieger and @atbogetti, now we have runFoldseek, runDali, and runBLAST. I will add some improvements to runDali and runBLAST, for example, the optional removal of temporary files and an option to provide Dali or BLAST input files instead of reaching the servers. I also plan to modify calcSignatureInteractions to include Dali and BLAST analysis instead Foldseek. MSA file will be optional. Unless we would like to include your script, James, with extracting sequences? It can be used together with runDali and runBLAST to create an MSA file. What do you think? |
Dali and BLAST do give alignments/mappings to the reference so we could get a full MSA directly possibly, but I don't know for sure. Using buildPDBEnsemble should also work for Dali, but I don't know if we can use the info from BLAST. |
…r reorganization needed]
I have now changed the calcSignatureInteractions() function to include folder analysis only, without the MSA file. We can discuss in the upcoming days what would be the best. |
I'd suggest having a wrapper function that includes the fixer, structural superposition, etc. and runs the method specific code after. I guess you're using a match chains function that takes an MSA as an optional argument, so the user can provide it if they trust it and use some other ProDy method for sequence alignment otherwise? There's a lot of code here so I haven't managed to look at it all yet. Hopefully I'll have a chance in the second half of next week |
I reorganized runFoldseek() and calcSignatureInteractions(). Now, to initiate calculations, we can use:
Each function will create the folder with downloaded, fixed (by adding missing atoms), and aligned structures.
|
I made pretty significant improvements in process_data(), which is a party of calcSignatureInteractions(), and in calcSignatureInteractions(), too, to properly include the MSA file. Unfortunately, if we have an MSA file, some information is obtained from computed interactions. And those are included to generate conservation plots. Therefore, interactions cannot be computed in a similar way as for the folder analysis. Also, I noticed that in the comment above, I gave the wrong MSA file. That one is (by default) correct:
|
@atbogetti and @jamesmkrieger, the code is ready for me. Feel free to test it on different structures. |
@karolamik13 I have approved this PR. I tested your most recent changes on PDB=1AKE and it's working well. |
Great. Thank you! |
In this code, we have the Signature Interactions pipeline with Foldseek analysis (provided by Anupam Banerjee) together with the merged pull request #1988 by James Krieger.
I created the following functions as a pipeline of SignInt analysis based on Anupam's code:
(1) runFoldseek() - this function processes a PDB file to extract a specified chain's sequence, searches for
homologous structures using foldseek, and prepares alignment outputs for further analysis
Before using the function, Foldseek needs to be installed, and the PDB database for screening needs to be downloaded:
More information can be found:
https://github.com/steineggerlab/foldseek?tab=readme-ov-file#databasesand
This function will not work under Windows due to the Unix commands (cat, awk, etc.).
Perhaps in the future, it can be adapted to Windows machines.
This command uses createFoldseekAlignment() and extractMultiModelPDB().
(2) createFoldseekAlignment() - Aligns sequences from prot_seq with homologous sequences identified in prot_foldseek,
generating a multiple sequence alignment.
(3) extractMultiModelPDB() - Extracts individual PDB models from multimodel PDB and places them into the pointed directory.
(4) calcSignatureInteractions() - Analyzes protein structures to identify various interactions using InSty.
How to use the code, for example:
Default names for the results are shortlisted_resind.msa, and all homologous PDB structures will be extracted to folder struc_homologs in the working directory by runFoldseek(). Next, calcSignatureInteractions() will analyze the aligned 3D structures of homologous and provide InSty results (including raw data and automatically generated plots).
The folder will contain such files (PNGs, txt):
aligned_structures_extended.pdb
aligned_structures.pdb
HydrogenBonds_consensus.txt
HydrogenBonds_plot_part10.png
HydrogenBonds_plot_part11.png
HydrogenBonds_plot_part12.png
HydrogenBonds_plot_part13.png
HydrogenBonds_plot_part14.png
HydrogenBonds_plot_part15.png
HydrogenBonds_plot_part16.png
HydrogenBonds_plot_part17.png
HydrogenBonds_plot_part18.png
HydrogenBonds_plot_part19.png
HydrogenBonds_plot_part1.png
HydrogenBonds_plot_part20.png
HydrogenBonds_plot_part2.png
HydrogenBonds_plot_part3.png
HydrogenBonds_plot_part4.png
HydrogenBonds_plot_part5.png
HydrogenBonds_plot_part6.png
HydrogenBonds_plot_part7.png
HydrogenBonds_plot_part8.png
HydrogenBonds_plot_part9.png
Hydrophobic_consensus.txt
Hydrophobic_plot_part10.png
Hydrophobic_plot_part11.png
Hydrophobic_plot_part1.png
Hydrophobic_plot_part2.png
Hydrophobic_plot_part3.png
Hydrophobic_plot_part4.png
Hydrophobic_plot_part5.png
Hydrophobic_plot_part6.png
Hydrophobic_plot_part7.png
Hydrophobic_plot_part8.png
Hydrophobic_plot_part9.png
PiCation_consensus.txt
PiCation_plot_part1.png
PiStacking_consensus.txt
PiStacking_plot_part1.png
prot.foldseek
prot_struc.msa
RepulsiveIonicBonding_consensus.txt
RepulsiveIonicBonding_plot_part1.png
SaltBridges_consensus.txt
SaltBridges_plot_part1.png
SaltBridges_plot_part2.png
seq_match_reconfirm.txt
shortlisted.foldseek
shortlisted.msa
shortlisted_resind.msa
struc_homologs