Skip to content

Latest commit

 

History

History
352 lines (260 loc) · 15.9 KB

README.md

File metadata and controls

352 lines (260 loc) · 15.9 KB

Neoantigen Pipeline Helper Scripts

These scripts assist in setting up files ofr manualling reviewing the results for Neoantigen Vaccine Desing results generate from the Washington University Immuno Pipeline.

Creating Case Final Report on compute 1

Before Immunogenomics Tumor Board Review

A written case final report will be created which includes a Genomics Review Report document. This document includes a section of a basic data QC review and a table summarizing values that pass/fail the FDA quality thresholds.

Basic data QC

Pull the basic data qc from various files. This script will output a file final_results/qc_file.txt and also print the summary to to screen.

mkdir $WORKING_BASE/../manual_review
cd $WORKING_BASE/../manual_review

bsub -Is -q oncology-interactive -G $GROUP -a "docker(griffithlab/neoang_scripts)" /bin/bash
python3 /opt/scripts/get_neoantigen_qc.py -WB $WORKING_BASE -f final_results --yaml $WORKING_BASE/yamls/$CLOUD_YAML

FDA Quality Thresholds

This script will output a file final_results/fda_quality_thresholds_report.tsv and also print the summary to to screen.

python3 /opt/scripts/get_FDA_thresholds.py -WB  $WORKING_BASE -f final_results

HLA Comparison

This script will output a file manual_review/hla_comparison.tsv and also print the summary to to screen.

python3 /opt/scripts/hla_comparison.py -WB $WORKING_BASE
exit

After Immunogenomics Tumor Board Review

After the Immunogenomics Tumor Board Review, both a .tsv and .xlsx file are downloaded from pVACview whihc contains the canidates marked as Accept, Review, Reject, and Pending. These files should be kept in a fould named itb-review-files.

Generate Protein Fasta

cd $WORKING_BASE
mkdir ../generate_protein_fasta
cd ../generate_protein_fasta
mkdir candidates
mkdir all

#generate a protein fasta file using the final annotated/evaluated neoantigen candidates TSV as input
#this will filter down to only those candidates under consideration and use the top transcript

# check the file to find Tumor sample ID in the #CHROM header of VCF

zcat $WORKING_BASE/final_results/annotated.expression.vcf.gz | less
export TUMOR_ID="100-049-BG004667"

bsub -Is -q general-interactive -G $GROUP -a "docker(griffithlab/pvactools:4.0.1)" /bin/bash

pvacseq generate_protein_fasta \
  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \
  --pass-only --mutant-only -d 150 \
  -s $TUMOR_ID \
  --aggregate-report-evaluation {Accept,Review} \
  --input-tsv ../itb-review-files/*.tsv  \
  $WORKING_BASE/final_results/annotated.expression.vcf.gz \
  25 \
  $WORKING_BASE/../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa

pvacseq generate_protein_fasta \
  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \
  --pass-only --mutant-only -d 150 \
  -s $TUMOR_ID  \
  $WORKING_BASE/final_results/annotated.expression.vcf.gz \
  25  \
  $WORKING_BASE/../generate_protein_fasta/all/annotated_filtered.vcf-pass-51mer.fa

exit 

To generate files needed for manual review, save the pVAC results from the Immunogenomics Tumor Board Review meeting as $SAMPLE.revd.Annotated.Neoantigen_Candidates.xlsx (Note: if the file is not saved under this exact name the below command will need to be modified).

bsub -Is -q oncology-interactive -G $GROUP -a "docker(griffithlab/neoang_scripts)" /bin/bash
cd $WORKING_BASE

export GCS_CASE_NAME="100-049-BG004667"

python3 /opt/scripts/generate_reviews_files.py -a ../itb-review-files/*.xlsx -c ../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv -classI final_results/pVACseq/mhc_i/*.all_epitopes.aggregated.tsv -classII final_results/pVACseq/mhc_ii/*.all_epitopes.aggregated.tsv -samp $GCS_CASE_NAME -o ../manual_review/

python3 /opt/scripts/color_peptides51mer.py -p ../manual_review/*Peptides_51-mer.xlsx -samp $GCS_CASE_NAME -o ../manual_review/

Creating Case Final Report locally

Before Immunogenomics Tumor Board Review

A written case final report will be created which includes a Genomics Review Report document. This document includes a section of a basic data QC review and a table summarizing values that pass/fail the FDA quality thresholds.

Basic data QC

Pull the basic data qc from various files. This script will output a file final_results/qc_file.txt and also print the summary to to screen.

mkdir $WORKING_BASE/../manual_review
cd $WORKING_BASE/../manual_review

docker pull griffithlab/neoang_scripts:latest
docker run -it -v $HOME/:$HOME/ -v $HOME/.config/gcloud:/root/.config/gcloud --env $WORKING_BASE griffithlab/neoang_scripts:latest /bin/bash

cd $WORKING_BASE

python3 /opt/scripts/get_neoantigen_qc.py -WB $WORKING_BASE -f final_results --yaml $WORKING_BASE/yamls/$CLOUD_YAML

FDA Quality Thresholds

This script will output a file final_results/fda_quality_thresholds_report.tsv and also print the summary to to screen.

python3 /opt/scripts/get_FDA_thresholds.py -WB  $WORKING_BASE -f final_results
exit

After Immunogenomics Tumor Board Review

After the Immunogenomics Tumor Board Review, both a .tsv and .xlsx file are downloaded from pVACview whihc contains the canidates marked as Accept, Review, Reject, and Pending. These files should be kept in a fould named itb-review-files.

Generate Protein Fasta

cd $WORKING_BASE
mkdir ../generate_protein_fasta
cd ../generate_protein_fasta
mkdir candidates
mkdir all

#generate a protein fasta file using the final annotated/evaluated neoantigen candidates TSV as input
#this will filter down to only those candidates under consideration and use the top transcript

# check the file to find Tumor sample ID in the #CHROM header of VCF

gzcat $WORKING_BASE/final_results/annotated.expression.vcf.gz | less
export TUMOR_SAMPLE_ID="100-049-BG004667"

docker pull griffithlab/pvactools:4.0.5
docker run -it -v $HOME/:$HOME/ --env $WORKING_BASE  --env SAMPLE_ID griffithlab/pvactools:4.0.5 /bin/bash

cd $WORKING_BASE

pvacseq generate_protein_fasta \
  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \
  --pass-only --mutant-only -d 150 \
  -s $TUMOR_SAMPLE_ID \
  --aggregate-report-evaluation {Accept,Review} \
  --input-tsv ../itb-review-files/*.tsv  \
  $WORKING_BASE/final_results/annotated.expression.vcf.gz \
  25 \
  $WORKING_BASE/../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa

pvacseq generate_protein_fasta \
  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \
  --pass-only --mutant-only -d 150 \
  -s $TUMOR_SAMPLE_ID  \
  $WORKING_BASE/final_results/annotated.expression.vcf.gz \
  25  \
  $WORKING_BASE/../generate_protein_fasta/all/annotated_filtered.vcf-pass-51mer.fa

exit 

To generate files needed for manual review, save the pVAC results from the Immunogenomics Tumor Board Review meeting as $SAMPLE.revd.Annotated.Neoantigen_Candidates.xlsx (Note: if the file is not saved under this exact name the below command will need to be modified).

cd $WORKING_BASE/../manual_review

docker pull griffithlab/neoang_scripts:latest
docker run -it -v $HOME/:$HOME/ -v $HOME/.config/gcloud:/root/.config/gcloud --env $WORKING_BASE griffithlab/neoang_scripts:latest /bin/bash

export SAMPLE="TWJF-10146-0029"

python3 /opt/scripts/setup_review.py -WB $WORKING_BASE -a ../itb-review-files/*.xlsx -c $WORKING_BASE/../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv -samp $SAMPLE  -classI $WORKING_BASE/final_results/pVACseq/mhc_i/*.all_epitopes.aggregated.tsv -classII $WORKING_BASE/final_results/pVACseq/mhc_ii/*.all_epitopes.aggregated.tsv 

Open colored_peptides51mer.html and copy the table into an excel spreadsheet. The formatting should remain. Utilizing the Annotated.Neoantigen_Candidates and colored Peptides_51-mer for manual review.

Description of Scripts

Get Basic QC

python3  /opt/scripts/get_neoantigen_qc.py --help
usage: get_neoantigen_qc.py [-h] [-WB WB] [-f FIN_RESULTS] [--n_dna N_DNA] [--t_dna T_DNA] [--t_rna T_RNA] [--concordance CONCORDANCE] [--contam_n CONTAM_N] [--contam_t CONTAM_T]
                            [--rna_metrics RNA_METRICS] [--strand_check STRAND_CHECK] --yaml YAML [--fin_variants FIN_VARIANTS]

Get the stats for the basic data QC review in the neoantigen final report.

optional arguments:
  -h, --help            show this help message and exit
  -WB WB                the path to the gcp_immuno folder of the trial you wish to tun script on, defined as WORKING_BASE in envs.txt
  -f FIN_RESULTS, --fin_results FIN_RESULTS
                        Name of the final results folder in gcp immuno
  --n_dna N_DNA         file path for aligned normal dna FDA report table
  --t_dna T_DNA         file path for aligned tumor dna FDA report table
  --t_rna T_RNA         file path for aligned tumor rna FDA report table
  --concordance CONCORDANCE
                        file path for Somalier results for sample tumor/normal sample relatedness
  --contam_n CONTAM_N   file path for VerifyBamID results for contamination the normal sample
  --contam_t CONTAM_T   file path for VerifyBamID results for contamination the tumor sample
  --rna_metrics RNA_METRICS
  --strand_check STRAND_CHECK
  --yaml YAML
  --fin_variants FIN_VARIANTS

GET FDA metrics

python3  /opt/scripts/get_FDA_thresholds.py --help
usage: get_FDA_thresholds.py [-h] [-WB WB] [-f FIN_RESULTS] [--n_dna N_DNA] [--t_dna T_DNA] [--t_rna T_RNA] [--una_n_dna UNA_N_DNA] [--una_t_dna UNA_T_DNA] [--una_t_rna UNA_T_RNA]
                             [--somalier SOMALIER] [--contam_n CONTAM_N] [--contam_t CONTAM_T]

Get FDA qc stats from various files and determine if they pass or fail.

optional arguments:
  -h, --help            show this help message and exit
  -WB WB                the path to the gcp_immuno folder of the trial you wish to tun script on, defined as WORKING_BASE in envs.txt
  -f FIN_RESULTS, --fin_results FIN_RESULTS
                        Name of the final results folder in gcp immuno
  --n_dna N_DNA         file path for aligned normal dna FDA report table
  --t_dna T_DNA         file path for aligned tumor dna FDA report table
  --t_rna T_RNA         file path for aligned tumor rna FDA report table
  --una_n_dna UNA_N_DNA
                        file path for unaligned normal dna FDA report table
  --una_t_dna UNA_T_DNA
                        file path for unaligned tumor dna FDA report table
  --una_t_rna UNA_T_RNA
                        file path for unaligned tumor rna FDA report table
  --somalier SOMALIER   file path for Somalier results for sample tumor/normal sample relatedness (concordance.somalier.pairs.tsv)
  --contam_n CONTAM_N   file path for VerifyBamID results for contamination the normal sample
  --contam_t CONTAM_T   file path for VerifyBamID results for contamination the tumor dna sample

HLA Comparison

python3  /opt/scripts/hla_comparison.py --help
usage: hla_comparison.py [-h] [-WB WB] [-f FIN_RESULTS] [--optitype_n OPTITYPE_N] [--optitype_t OPTITYPE_T] [--phlat_n PHLAT_N] [--phlat_t PHLAT_T] [--clinical CLINICAL] [--o O]

Compare HLA alleles called by phlat, opitype, and clincal data if available.

optional arguments:
  -h, --help            show this help message and exit
  -WB WB                The path to the gcp_immuno folder of the trial you wish to run the script on, defined as WORKING_BASE in envs.txt
  -f FIN_RESULTS, --fin_results FIN_RESULTS
                        Name of the final results folder in gcp immuno
  --optitype_n OPTITYPE_N
                        File path for optitype normal calls
  --optitype_t OPTITYPE_T
                        File path for optitype tumor calls
  --phlat_n PHLAT_N     File path for phlat normal calls
  --phlat_t PHLAT_T     File path for phlat tumor calls
  --clinical CLINICAL   File path for the clinical_calls.txt
  --o O                 Output folder

Setup Review

The set up review script runs two other scripts: generate_reviews_files.py and color_peptides51mer.py. The first sets up the Annotated.Neoantige.Canidates spreadsheet and the Peptides 51mer spreadsheet. The second script colors the Peptides 51mer sequences and outputs an html table whihc can be copied into a Microsoft spreadsheet.

python3  /opt/scripts/setup_review.py --help
usage: setup_review.py [-h] [-WB WB] [-samp SAMP] [-a A] [-c C] -classI CLASSI -classII CLASSII

Sets up manuel review files

optional arguments:
  -h, --help        show this help message and exit
  -WB WB            the path to the gcp_immuno folder of the trial you wish to tun script on, defined as WORKING_BASE in envs.txt
  -samp SAMP        Name of the sample
  -a A              Path to ITB Reviewed Candidates
  -c C              Path to annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv
  -classI CLASSI    Path to classI all_epitopes.aggregated.tsv
  -classII CLASSII  Path to classII all_epitopes.aggregated.tsv

Generate Review Files

python3  /opt/scripts/generate_reviews_files.py --help
usage: generate_reviews_files.py [-h] [-a A] [-c C] [-samp SAMP] [-WB WB] [-f FIN_RESULTS]

Create the file needed for the neoantigen manuel review

optional arguments:
  -h, --help            show this help message and exit
  -a A                  The path to the ITB Reviewed Candidates
  -c C                  The path to annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv from the generate_protein_fasta script
  -samp SAMP            The name of the sample
  -WB WB                the path to the gcp_immuno folder of the trial you wish to tun script on, defined as WORKING_BASE in envs.txt
  -f FIN_RESULTS, --fin_results FIN_RESULTS
                        Name of the final results folder in gcp immuno

Color Peptides 51mer

python3  /opt/scripts/color_peptides51mer.py --help
usage: color_peptides51mer.py [-h] -p P -classI CLASSI -classII CLASSII [-WB WB] [-samp SAMP]

Color the 51mer peptide

optional arguments:
  -h, --help        show this help message and exit
  -p P              The path to the Peptides 51 mer
  -classI CLASSI    The path to the classI all_epitopes.aggregated.tsv used in pVACseq
  -classII CLASSII  The path to the classII all_epitopes.aggregated.tsv used in pVACseq
  -WB WB            the path to the gcp_immuno folder of the trial you wish to tun script on, defined as WORKING_BASE in envs.txt
  -samp SAMP        Name of the sample

Bold Class II

Bold Class II is not utilized in the current workflow of setting up the manual review. However, it is included as an example of how adding stylization (in this case bold) to certain characters within individual cells of spreadsheets can be accomplished using BeautifulSoup. If you wanted to only bold certain characters (or do any stylzing such as coloring), you can insert a style tag directly into the HTML. However, if you wanted to do formatting inside formatting such as in these review files where there needs to some characters which are red, bold, underlined, or any combination if thise mentioned, BeuaitfulSoup cannot accomplish this.

python3 /opt/scripts/bold_classII.py --help
usage: bold_classII.py [-h] -p P -classI CLASSI -classII CLASSII -o O

Bold the class II pepetides

optional arguments:
  -h, --help        show this help message and exit
  -p P              The path to the Peptides 51 mer
  -classI CLASSI    The path to the classI all_epitopes.aggregated.tsv used in pVACseq
  -classII CLASSII  The path to the classII all_epitopes.aggregated.tsv used in pVACseq
  -o O              Output location

Notes for Building and Testing

Make sure to build the docker on compute1, otherwise the docker will not work on compute1. The command I use looks like this:

Steps for building and testing a docker:

  1. Build to testing and test
  2. Build to a new version (for record-keeping)
  3. Build to latest

I know this seems tedious but you will mess something up and your docker will be broken.

bsub -G compute-oncology -q general-interactive -Is -a 'docker_build(griffithlab/neoang_scripts:testing)' -- --tag griffithlab/neoang_scripts:testing .