This repository contains a simplified method of installing colabfold for local UoD use, which is loosely based on localcolabfold. Colabfold provides a greatly accelerated structure prediction compared to the 'traditional' alphafold approach by replacing the Hmmer/HHblits homology searches with a much faster MMSeqs2 based method - see the colabfold paper. The localcolabfold installation does not work out of the box in our environment, so this is a streamlined installation which should produce a functional installation by running a single setup script.
- Nothing inparticular - The UoD HPC cluster provides Singularity access, including on CUDA-enabled GPU nodes appropriate for running colabfold.
This is available in /cluster/sw/colabfold
until a better home can be found for it...
- Anaconda/Miniconda3/Mamba installation. The installation script will preferentially use mamba to carry out the installation, but will fallback to conda if this is not available. If you don't already have a conda installation, see The Cluster Wiki for instructions on setting this up.
- Approximately 14 Gb free disk space. This is mostly required for storing the alphafold weights
- git (optional)
- About 15 minutes of your life
All necessary components are already available on the cluster.
-
Obtain a copy of this repository either using git i.e.
git clone git://github.com/bartongroup/JCA_colabfold_batch.git
or by downloading a release tarball from the link on the right under 'Releases'. Copy this tarball onto the cluster filesystem and extract with
tar zxvf v1.5.2-beta3.tar.gz
-
Change into the directory which is created by step 1 - this will have the repository name if cloned from git, or the version number if obtained from a Release tarball. a) From a repository clone:
cd Colabfold_batch_installer
b) From a release tarball:
cd Colabfold_batch_installer-1.5.2-beta3
-
Run the setup script:
./setup.sh
This will create a new conda environment namedcolabfold_batch
based upon the definition within thecolabfold_batch.yaml
file. Alphafold weights are then downloaded into the$CONDA_PREFIX/share/colabfold
directory within the conda environment. The installation will take approximately 15 minutes to complete.
Usage: /cluster/sw/colabfold/current/colabfold.sh -i /path/to/fasta/file [-c 'colabfold arguments'] [-h] [-u]
The colabfold.sh
script can be submitted directly to GridEngine, and requires at a minimun the path to an input fasta file. Any specific colabfold arguments can be provided using the -c
argument. Log files will be written to a 'colabfold_logs' directory in the submission directory, while outputs will be written to a colabfold_outputs
directory within the directory containing the submitted fasta file.
i.e. qsub /path/to/run_colabfold_singularity.sh -i test/cadh5_arath.fa -c "--num-recycle 5 --amber --num-relax 5"
Full colabfold usage information can be found by running run_colabfold_singularity.sh -u
Activate the colabfold_batch
environment
conda activate colabfold_batch
The colabfold_batch
program will now be available on your path. Run colabfold_batch -h
for help information.
An example script is provided as run_colabfold.sh
which is appropriate for submission to the UoD HPC cluster.
Usage: ./run_colabfold.sh -i /path/to/fasta/file [-c "colabfold_arguments"]
The only required argument is the path to an input fasta file containing the sequences of interest. Any additional colabfold_batch arguments can be specified with the -c
argument, making sure to surround the colabfold arguments in quotes so the are captured as a single argument i.e.
run_colabfold.sh -i test/cadh5_arath.fa -c "--num-recycle 5 --amber --num-relax 5"
This script can be submitted directly to GridEngine directly using qsub
, and is configured to run on one of the Nvidia A40 GPUs:
qsub run_colabfold.sh -i test/cadh5_arath.fa -c "--num-recycle 5 --amber --num-relax 5"
Resulting job logs will be written into a subdirectory of the submission directory named colabfold_logs
, while outputs will be written to a colabfold_results
directory.
Make sure you check the log files for errors!
N.B. There are known issues with alphafold in relaxing models using Amber on GPUs - if this fails, omit the --use-gpu-relax
argument and run amber only on CPUs - This part of the process on CPUs doesn't seem overly slow
Full colabfold usage information can be found by running run_colabfold.sh -u
The GPU nodes which are capable of running colabfold were funded through a BBSRC ALERT bid for training machine learning models. Usage of these nodes will be monitored by DTS and it may necessary to impose limits on their usage if workloads are interfering with the primary function of these nodes. Groups with high demands for colabfold should consider contributing appropriate hardware to the cluster to support their requirements.
Some warnings are expected within the log files, and do not necessarily mean something has gone wrong.
-
2023-05-24 16:54:44.003090: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
This warning relates to an optional package which does not work with the combination of library versions required for our setup which is due to be resolved in a future software release. This does not affect the results, but may increase the required runtime. -
2023-05-24 16:54:38,928 Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
2023-05-24 16:54:38,929 Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
2023-05-24 16:54:38,929 Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
These warnings relate to alternative computational backends which may be used to carry out the prediction. The lineAvailable platform names are: Interpreter CUDA Host
indicates that the CUDA backend required for GPU acceleration has been found