Skip to content

Commit

Permalink
Merge pull request #766 from MoeRichert-USDA/update-deepvariant
Browse files Browse the repository at this point in the history
update deepvariant
  • Loading branch information
MoeRichert-USDA authored Nov 20, 2024
2 parents a19eda9 + de34370 commit f68eaf2
Showing 1 changed file with 88 additions and 0 deletions.
88 changes: 88 additions & 0 deletions sn_collections/_workshops/2024-ai-user-forum/21-deepvariant.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,91 @@ Prerequisites:
---

DeepVariant is a DNA sequence variant caller that uses a convolutional neural network (CNN) to call genotypes relative to a reference genome assembly. In this workshop, we will discuss a workflow for calling variants from whole-genome data for multiple individuals. <!--excerpt--> This workflow involves trimming and filtering raw reads, mapping them to a reference assembly, calling variants for each individual, merging the variants of all individuals into a single variant call format file (.vcf), and filtering the resulting variant file. We will guide participants through this pipeline step by step, providing generalized commands for each phase of the process, as well as strategies for optimizing cluster usage and reducing compute time. The final product will be a .vcf containing variants for all individuals which can be used for downstream analyses, along with a solid understanding for performing variant detection using DeepVariant.


## Tutorial Setup Instructions

Steps to prepare for the tutorial session:

1. **Login to Atlas Open OnDemand** at [https://atlas-ood.hpc.msstate.edu/](https://atlas-ood.hpc.msstate.edu/). For more information on login procedures for web-based SCINet access, see the [SCINet access user guide](https://scinet.usda.gov/guides/access/web-based-login).

1. **Open a command-line session** by clicking on "Clusters" -> "Atlas Shell Access" on the top menu. This will open a new tab with a command-line session on Atlas' login node.

1. **Request resources on a compute node** by running the following command:

{: .copy-code }
```
srun --reservation=forum -A scinet_workshop1 -t 03:00:00 -n 1 --mem 8G --pty bash
```
1. **Create a workshop working directory** and copy the workshop materials into it by running the following commands. Note: you do not have to edit the commands with your username as it will be determined by the $USER variable.
{: .copy-code }
```
mkdir -p /90daydata/shared/$USER/deepvariant
cd /90daydata/shared/$USER/deepvariant
cp /project/ai_forum/deepvariant/Sample_Data/assembly.fasta .
mkdir PE_directory
cp -r /project/ai_forum/deepvariant/Sample_Data/samplename_R*.fastq.gz .
# Create a directory for trimmed reads
mkdir Trimmed
# Create a directory for mapped reads
mkdir Mapped
# Create a directory for variants
mkdir Variants
# Create a directory for intermediate files
mkdir Int
# Activate your conda environment
module load miniconda3
module load samtools
module load apptainer
source activate /project/ai_forum/deepvariant/Software/condaenvs/deepvariant
# Prepare for a fun time
# Step 1: trimming
# Trim adapter artifacts from your reads
trim_galore --paired \
--basename samplename \
--output_dir Trimmed \
--cores 24 \
PE_directory/samplename_R1.fastq.gz \
PE_directory/samplename_R2.fastq.gz
# Step 2: mapping
# Index your reference assembly
bwa-mem2 index assembly.fasta
# Index reference assembly using Samtools (for later)
samtools faidx assembly.fasta >assembly.fasta.fai
# For each of your trimmed and paired reads:
bwa-mem2 mem –t 48 assembly.fasta \
Trimmed/samplename_val_1.fq.gz \
Trimmed/samplename_val_2.fq.gz | \
samblaster | \
samtools sort -@ 48 –o Mapped/samplename.bam
# Step 3: call variants
apptainer exec Software/sifs/deepvariant_1.6.0.sif \
python3 Software/deepvariant/scripts/run_deepvariant.py \
--num_shards=48 \
--model_type WGS \
--output_vcf Variants/samplename.vcf.gz \
--output_gvcf Variants/samplename.g.vcf.gz \
--ref assembly.fasta \
--reads Mapped/samplename.bam \
--sample_name samplename \
--intermediate_results_dir Int/samplename_int \
--make_examples_extra_args "normalize_reads=true” \\
--dry_run
```
1. **Stop the interactive job** on the compute node by running the command exit.

0 comments on commit f68eaf2

Please sign in to comment.