Declaration 1 This repository contains an inference pipeline of AlphaFold2 with a bona fide translation from Haiku/JAX (https://github.com/deepmind/alphafold) to PyTorch. Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper. Please also refer to the Supplementary Information for a detailed description of the method.
Declaration 2 The setup procedures were modified from the two repos: https://github.com/kalininalab/alphafold_non_docker https://github.com/deepmind/alphafold with only some exceptions. I will label the difference for highlight.
Declaration 3 This repo is independently implemented, and is different from a previously unofficial version (https://github.com/lucidrains/alphafold2). No one is better than the other, and the differences are in 3 points: (1) this repo is major in acceleration of inference, in compatible to weights released from DeepMind; (2) this repo delivers a reliable pipeline accelerated on Intel® Xeon and Intel® Optane® PMem by Intel® oneAPI, which are alternative ways to deploy the model. (3) this repo places CPU as its primary computation resource for acceleration, which may not provide an optimal speed on GPU.
Primary solution for setup of alphafold2 environment optimized on Intel-Architecture 3
-
install anaconda;
-
create conda environment:
conda create -n iaf2 python=3.9.7 conda activate iaf2
- initialize by running setup_env.sh:
bash setup_env.sh \ <root_home> \ <refdata_dir> \ <conda_env_name> \ <experiment_name> \ <model_name>
There is already an example input "sample.fa" copied into your subfolder <root_home>/samples/ .
-
run squential preprocessing (MSA and template search) on samples in $root_home/samples
bash online_preproc_baremetal.sh \ <root_home> \ # root of all intermediate data <data_dir> \ # root of reference dataset that AlphaFold2 needs <input_dir> \ # Example: <path_to_root_home>/samples <out_dir> ## Example: <path_to_root_home>/experiments/<experiment_name (created in setup)
<input_dir>=<root_home>/samples <out_dir>=<root_home>/experiments/<customized_subfolder> intermediates data can be seen under $root_home/experiments//intermediates and $root_home/experiments//msas these datafiles will be used as input of modelinfer
-
run sequential model inference to predict unrelaxed structures from MSA and template results
bash online_inference_baremetal.sh \ <path_to_condaenv> \ # the path to your conda virtual environment, e.g. ~/anaconda3/envs/<env_name> <root_home> \ # root of all intermediate data <root_data> \ # root of reference dataset that AlphaFold2 needs <input_dir> \ # Example: <path_to_root_home>/samples <out_dir> \ # Example: <path_to_root_home>/experiments/<experiment_name (created in setup) <model_name> # model_1
<input_dir>=<root_home>/samples <out_dir>=<root_home>/experiments/<customized_subfolder> unrelaxed data can be seen under $root_home/experiments/ now you can visualize the PDB files
-
run batch preprocessing (MSA and template search) on samples in $root_home/samples
bash batch_preproc_baremetal.sh \ <root_home> \ # root of all intermediate data <data_dir> \ # root of reference dataset that AlphaFold2 needs <input_dir> \ # Example: <path_to_root_home>/samples <out_dir> # Example: <path_to_root_home>/experiments/<experiment_name (created in setup)
intermediates data can be seen under $root_home/experiments//intermediates and $root_home/experiments//msas these datafiles will be used as input of modelinfer
-
run batch model inference to predict unrelaxed structures from MSA and template results
bash batch_inference_baremetal.sh \ <path_to_condaenv> \ # the path to your conda virtual environment, e.g. ~/anaconda3/envs/<env_name> <root_home> \ # root of all intermediate data <root_data> \ # root of reference dataset that AlphaFold2 needs <input_dir> \ # Example: <path_to_root_home>/samples <out_dir> \ # Example: <path_to_root_home>/experiments/<experiment_name (created in setup) <model_name> # model_1
unrelaxed data can be seen under $root_home/experiments/ now you can visualize the PDB files
-
Notices:
the optimal parallel thread number depends on the max memory size if you have PMem installed on your system, please use 1 physical core per thread if you only have DRAM memory of GB level, please estimate memory peak before use
DeepMind provided scripts ... to download and cook all datasets as reference: Default usage: <INSTALL_ROOT>/models/aidd/pytorch/alphafold2/inference/alphafold/scripts/download_all_data.sh <DOWNLOAD_DIR> to use reduced_dbs: <INSTALL_ROOT>/models/aidd/pytorch/alphafold2/inference/alphafold/scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs please find extra scripts to download specific datasets <INSTALL_ROOT>/models/aidd/pytorch/alphafold2/inference/alphafold/scripts/*
If you use the code or data in this package, please cite:
@Article{AlphaFold2021,
author = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
journal = {Nature},
title = {Highly accurate protein structure prediction with {AlphaFold}},
year = {2021},
doi = {10.1038/s41586-021-03819-2},
note = {(Accelerated article preview)},
}
Copyright (c) 2021 DeepMind Technologies Limited. Copyright (c) 2022 Intel Corporation Limited.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The AlphaFold parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode