automated APPSquared

This version of the pipeline is intended to be run on a weekly schedule. The pipeline first pulls variant hashes with counts >10 from the global variant hash alignments tables, generates an amino acid fasta file from these variant hashes (if it fails to generate a sequence, aa_seq_fails.txt records this information in the local directory). The variant_hash_tracking file in this repo is used for tracking purposes as the name implies, and this file is pulled, updated, and pushed back to the repo after a run. The protein_modeling.isolate_name table is updated with the isolate name that corresponds to gaa.isolate_ranking=1 and updated in CDP for consistency in naming conventions. The amino acid fastas are run in AlphaFold2 using a snakemake workflow with variable control resource allocation in the scbs-vsdb-01 server. It then runs the APPSquared pipeline on these structures (https://pypi.org/project/appsquared/) archives pdbs and raw output in the GAT group working area and securely copies formatted data tables for upload to CDP.

To install the conda environments:

conda env create --name appsquared --file=appsquared.yaml
conda env create --name glyc --file=glyc.yaml
conda env create --name getcontacts --file=getcontacts.yaml

To run the pipeline for surveillance purposes:

bash run_docker_AF.sh -o </path/to/output/dir> -s <start_date format: %Y-%m-%d>

    example: bash run_docker_AF.sh -o /home/nicole -s 2023-10-01

pipeline usage standalone (does not generate AlphaFold2 structures or upload to CDP):

bash BCP.sh -d </path/to/pdb/files> -n name_of_output_dir

table uploads, run from cdp-client-02:

./SIMPLE_auto do

Authors and acknowledgment

Nicholas Kovacs, Brian Mann, Kristine Lacek, Norman Hassell, Matthew Wersebe, Sam Shepard have all made meaningful contributions to this project in the form of contributing code that was either used directly as noted under the shebang in each file or modified for this purpose.

Google Deepmind's AlphaFold2 is deployed for generating high confidence structural predictions for influenza antigenic proteins. https://github.com/google-deepmind/alphafold https://www.schrodinger.com/products/glide

Schrodinger prepwizard is used to optimize structures and prepare for dockings. The Glide program is deployed for sialic acid dockings to predicted hemagglutinin structures. https://www.schrodinger.com/science-articles/protein-preparation-wizard

To calculate the intermolecular contacts, the Stanford Getcontacts project is deployed. https://getcontacts.github.io/

Baker Lab's Rosetta is used to calculate the stability of proteins run in the pipeline. https://new.rosettacommons.org/docs/latest/Home

Nicholas Kovacs (CDC) developed the glycosylation distance calculator.

Norman Hassell (CDC) wrote the SQL for generating the variants of interest.

Nicole Paterson (STI/CDC) developed the automated pipeline.

License

open source

Project status

Beta

Automated_APPSquared_2_0

Automated version of APPSquared Pipeline plus updates

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
BCP.sh		BCP.sh
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
appsquared.yaml		appsquared.yaml
appsquared_workflow.png		appsquared_workflow.png
config_appsquared.yaml		config_appsquared.yaml
dataframe.py		dataframe.py
get_static_contacts.py		get_static_contacts.py
getcontacts.yaml		getcontacts.yaml
glyc.py		glyc.py
glyc.yaml		glyc.yaml
run_docker_AF.sh		run_docker_AF.sh
run_glyc.sh		run_glyc.sh
seq_pull_v_hash.py		seq_pull_v_hash.py
table_gen.py		table_gen.py
variant_hash_tracking		variant_hash_tracking
yaml_config_gen.py		yaml_config_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

automated APPSquared

Authors and acknowledgment

License

Project status

Automated_APPSquared_2_0

About

Releases

Packages

Languages

License

nicolepaterson/Automated_APPSquared_2_0

Folders and files

Latest commit

History

Repository files navigation

automated APPSquared

Authors and acknowledgment

License

Project status

Automated_APPSquared_2_0

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages