Skip to content

nicolepaterson/Automated_APPSquared_2_0

Repository files navigation

automated APPSquared

This version of the pipeline is intended to be run on a weekly schedule. The pipeline first pulls variant hashes with counts >10 from the global variant hash alignments tables, generates an amino acid fasta file from these variant hashes (if it fails to generate a sequence, aa_seq_fails.txt records this information in the local directory). The variant_hash_tracking file in this repo is used for tracking purposes as the name implies, and this file is pulled, updated, and pushed back to the repo after a run. The protein_modeling.isolate_name table is updated with the isolate name that corresponds to gaa.isolate_ranking=1 and updated in CDP for consistency in naming conventions. The amino acid fastas are run in AlphaFold2 using a snakemake workflow with variable control resource allocation in the scbs-vsdb-01 server. It then runs the APPSquared pipeline on these structures (https://pypi.org/project/appsquared/) archives pdbs and raw output in the GAT group working area and securely copies formatted data tables for upload to CDP.

appsquared_workflow

To install the conda environments:

conda env create --name appsquared --file=appsquared.yaml
conda env create --name glyc --file=glyc.yaml
conda env create --name getcontacts --file=getcontacts.yaml

To run the pipeline for surveillance purposes:

bash run_docker_AF.sh -o </path/to/output/dir> -s <start_date format: %Y-%m-%d>
    example: bash run_docker_AF.sh -o /home/nicole -s 2023-10-01

pipeline usage standalone (does not generate AlphaFold2 structures or upload to CDP):

bash BCP.sh -d </path/to/pdb/files> -n name_of_output_dir

table uploads, run from cdp-client-02:

./SIMPLE_auto do

Authors and acknowledgment

Nicholas Kovacs, Brian Mann, Kristine Lacek, Norman Hassell, Matthew Wersebe, Sam Shepard have all made meaningful contributions to this project in the form of contributing code that was either used directly as noted under the shebang in each file or modified for this purpose.

Google Deepmind's AlphaFold2 is deployed for generating high confidence structural predictions for influenza antigenic proteins. https://github.com/google-deepmind/alphafold https://www.schrodinger.com/products/glide

Schrodinger prepwizard is used to optimize structures and prepare for dockings. The Glide program is deployed for sialic acid dockings to predicted hemagglutinin structures. https://www.schrodinger.com/science-articles/protein-preparation-wizard

To calculate the intermolecular contacts, the Stanford Getcontacts project is deployed. https://getcontacts.github.io/

Baker Lab's Rosetta is used to calculate the stability of proteins run in the pipeline. https://new.rosettacommons.org/docs/latest/Home

Nicholas Kovacs (CDC) developed the glycosylation distance calculator.

Norman Hassell (CDC) wrote the SQL for generating the variants of interest.

Nicole Paterson (STI/CDC) developed the automated pipeline.

License

open source

Project status

Beta

Automated_APPSquared_2_0

Automated version of APPSquared Pipeline plus updates

About

Automated version of APPSquared Pipeline plus updates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published