Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlphaFold, new flags for resource utilisation #20421

Conversation

VRehnberg
Copy link
Contributor

@VRehnberg VRehnberg commented Apr 25, 2024

AlphaFold computations has two noteable parts in its runtime. An MSA search running utilising only CPUs to create the input features for a specific input and then the prediction part that can utilize GPUs (see e.g. slide 20 from this EUM22 presentation https://easybuild.io/eum22/022_eum22_alphafold.pdf).

Allocating a GPU for the first part of this computation can be considered a waste of resources. This PR tackles this by:

This seems to be what was done here https://www.nsc.liu.se/support/systems/berzelius-software/berzelius-alphafold/ and I am in a similar situation.

@VRehnberg VRehnberg marked this pull request as draft April 25, 2024 14:01
@VRehnberg
Copy link
Contributor Author

N.B. there is one breaking change.

https://github.com/easybuilders/easybuild-easyconfigs/blob/fe59f1b8454c4f81229068858f1269a589c17de9/easybuild/easyconfigs/a/AlphaFold/AlphaFold-2.3.1_add-run_features_only-option.patch adds some logic to not recompute the features.pkl if it exists. This will lead to new behavior now if output_dir is reused with the same fasta name (even if fasta file is reused). Other alternatives:

  • add another flag if this is to make enable this as well
  • save checksum of fasta file as well and check that it matches before reusing (but what if database was changed etc.)

@VRehnberg
Copy link
Contributor Author

Another thing worth thinking about is if this should get a versionsuffix or not.

@ThomasHoffmann77
Copy link
Contributor

ThomasHoffmann77 commented Apr 25, 2024

N.B. there is one breaking change.

https://github.com/easybuilders/easybuild-easyconfigs/blob/fe59f1b8454c4f81229068858f1269a589c17de9/easybuild/easyconfigs/a/AlphaFold/AlphaFold-2.3.1_add-run_features_only-option.patch adds some logic to not recompute the features.pkl if it exists. This will lead to new behavior now if output_dir is reused with the same fasta name (even if fasta file is reused). Other alternatives:

  • add another flag if this is to make enable this as well
  • save checksum of fasta file as well and check that it matches before reusing (but what if database was changed etc.)

We have a similar patch installed at our site. I did not add it to PR #19942 as it changes quite some lines of the original code and adds some other features. Maybe it could be added to be optional.

For features we do not use a flag. We check whether features.pkl exists. Furthermore, the pipeline stops after writing features.pkl. Furthermore, we terminate a job, if features.pkl does not exist and the job environment has CUDA_VISIBLE_DEVICES set.
We furthermore allow to resume a pipeline or to run each prediction as a separate job in a job array.

@VRehnberg
Copy link
Contributor Author

We have a similar patch installed at our site.

@ThomasHoffmann77 Is that patch available somewhere? Sure sounds like a great patch.

@ThomasHoffmann77
Copy link
Contributor

Is that patch available somewhere?

@VRehnberg I added AlphaFold-2.3.2_EMBLpipeline002.patch to PR #19942.
ALPHAFOLD_RELAX_PARALLEL is experimental and occasionally runs out of memory for larger predictions.
Any suggestions would be appreciated.

@VRehnberg
Copy link
Contributor Author

Will wait on #19942, adapted patch for C3SE at https://gist.github.com/VRehnberg/6dfe1e83c9fdbb1bdfd519fa685c580b main difference to the one in #19942 are using jax.default_backend to detect GPU and to modify sanity check script to only do msa search.

@VRehnberg VRehnberg closed this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants