Skip to content

Commit

Permalink
updating readme, fixing genotypes, adding annovar
Browse files Browse the repository at this point in the history
  • Loading branch information
adigenova committed May 21, 2024
1 parent 960bdb1 commit 3c0e53f
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 33 deletions.
86 changes: 58 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,73 @@
# somatic_point_mutations
call somatic point mutations from tumor/normal pairs
# Somatic Point Mutations

## nextflow
```
nextflow run digenoma-lab/somatic_point_mutations -r v1.0 --tn test.csv -params-file strelka-params.yml -profile kutral
```
This repository provides a Nextflow pipeline for calling somatic point mutations from tumor/normal pairs using Whole Genome Sequencing (WGS) or Exome data.

## Getting Started

### Tumor/Normal file
### Running the Pipeline

CSV file indicating the paths to CRAM or BAM files, including index and alternative manta_indel VCFs files.
To run the pipeline, use the following command:

```sh
nextflow run digenoma-lab/somatic_point_mutations -r v1.1 --tn test.csv -params-file strelka-params.yml -profile kutral
```

### Input File: Tumor/Normal Pairs

Prepare a CSV file indicating the paths to CRAM or BAM files, including index and optional manta_indel VCF files. The CSV file should follow this format:

```csv
sampleId,normal,normal_index,tumor,tumor_index,manta_indel,manta_indel_index
A,A.cram,A.cram.crai,AT.cram,AT.cram.crai,,
B,B.cram,B.cram.crai,BT.cram,BT.cram.crai,,
C,C.cram,C.cram.crai,CT.cram,CT.cram.crai,,
D,D.cram,D.cram.crai,DT.cram,DT.cram.crai,DT.manta.vcf.gz,DT.manta.vcf.gz.tbi
```

### Available Options
## Pipeline Options

The `somatic_point_mutations` pipeline has several required and optional arguments.

### Required Arguments

- `--tn`: CSV file with tumor/normal pairs.
- `--fasta`: Reference genome file in FASTA format.
- `--fai`: Reference genome index file in FAI format.

### Optional Arguments

- `--outdir`: Directory for Nextflow results. Default: `./results`.
- `--exome`: Set if the data is Exome rather than WGS. Default: `false`.
- `--target_bed`: Target regions for Strelka in BED format for hg38. Default: `/somatic_point_mutations/auxfiles/hg38.bed.gz`.
- `--target_bed_index`: Index for target BED regions. Default: `/somatic_point_mutations/auxfiles/hg38.bed.gz.tbi`.

### Annovar Options

- `--annovar_bin`: Path to `annovar_table.pl` executable. Default: `/annovar/annovar/table_annovar.pl`.
- `--annovar_bd`: Path to Annovar database for hg38. Default: `/databases/annovar/hg38`.
- `--annovar_protocol`: Databases included in Annovar analysis. Default: `ensGene,clinvar_20220320,revel,dbnsfp42c,gnomad30_genome,avsnp150,icgc28`.
- `--annovar_operation`: Operations according to Annovar selected databases. Default: `g,f,f,f,f,f,f`.

## Example Usage

```sh
nextflow run digenoma-lab/somatic_point_mutations -r v1.1 \
--tn test.csv \
--fasta /path/to/reference.fasta \
--fai /path/to/reference.fasta.fai \
--outdir ./results \
--exome true \
--target_bed /path/to/target.bed.gz \
--target_bed_index /path/to/target.bed.gz.tbi \
--annovar_bin /path/to/annovar/table_annovar.pl \
--annovar_bd /path/to/annovar/hg38 \
--annovar_protocol ensGene,clinvar_20220320,revel,dbnsfp42c,gnomad30_genome,avsnp150,icgc28 \
--annovar_operation g,f,f,f,f,f,f
-profile kutral
```
somatic_point_mutations.nf: somatic point mutation caller pipeline.
Required arguments:
--tn tumor normal pairs
[default: false]
--fasta reference file in fasta format
[default: false]
--fai reference index file in fai format
[default: false]
Optional arguments:
--outdir The NextFlow result directory.
[default: ./results]
--exome Data is Exome rather than WGS.
[default: false]
--target_bed target region for strelka in bed format hg38
[default: /somatic_point_mutations/auxfiles/hg38.bed.gz]
--target_bed_index bed index for target regions
[default: /somatic_point_mutations/auxfiles/hg38.bed.gz.tbi]
```

## Support

If you encounter any issues or have questions, please open an issue on the [GitHub repository](https://github.com/digenoma-lab/somatic_point_mutations).


17 changes: 17 additions & 0 deletions auxfiles/fixGT.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

for vcf in "$@"
do

out=${vcf/vcf.gz/vcf}

first_format_num=$(zgrep -n -m 1 '##FORMAT' "$vcf" | cut -d : -f 1)
zcat "$vcf" | sed "$first_format_num"'i##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' > "$out"
sed -ri 's|(DP:)|GT:\1|' "$out"
sed -ri 's|(:TU\t)|\10/0:|g' "$out"
sed -ri 's|(:TU\t[^\t]*\t)|\10/1:|g' "$out"
sed -ri 's|(:BCN50\t)|\10/0:|g' "$out"
sed -ri 's|(:BCN50\t[^\t]*\t)|\10/1:|g' "$out"
gzip -f $out

done
12 changes: 7 additions & 5 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def help_function() {
| --annovar_protocol databases included in annovar analysis
| [default: ${params.annovar_protocol}]
| --annovar_operation operation according to annovar selected databases
| [default: ${params.annovar_operations}]
| [default: ${params.annovar_operation}]
| """.stripMargin()
// Print the help with the stripped margin and exit
println(help)
Expand Down Expand Up @@ -94,6 +94,8 @@ process STRELKA_SOMATIC {
mv strelka/results/variants/somatic.indels.vcf.gz.tbi ${prefix}.somatic_indels.vcf.gz.tbi
mv strelka/results/variants/somatic.snvs.vcf.gz ${prefix}.somatic_snvs.vcf.gz
mv strelka/results/variants/somatic.snvs.vcf.gz.tbi ${prefix}.somatic_snvs.vcf.gz.tbi
# we add the genotype information to strelka files
sh ${baseDir}/auxfiles/fixGT.sh *.vcf.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down Expand Up @@ -143,15 +145,15 @@ process ANNOVAR{

script:
"""
${params.annovar_bin}/table_annovar.pl ${variants} \\
${params.annovar_bd}/hg38 -out ${meta}_annovar --thread $task.cpus \\
${params.annovar_bin} ${variants} \\
${params.annovar_bd} -out ${meta}_annovar --thread $task.cpus \\
-nastring . -vcfinput --buildver hg38 --codingarg -includesnp --remove --onetranscript \\
-protocol ${params.annovar_protocol} -operation ${params.annovar_operation}
"""
stub:
"""
echo ${params.annovar_bin}/table_annovar.pl ${variants} \\
${params.annovar_bd}/hg38 -out ${meta}_annovar --thread $task.cpus \\
echo ${params.annovar_bin} ${variants} \\
${params.annovar_bd} -out ${meta}_annovar --thread $task.cpus \\
-nastring . -vcfinput --buildver hg38 --codingarg -includesnp --remove --onetranscript \\
-protocol ${params.annovar_protocol} -operation ${params.annovar_operation}
touch ${meta}_annovar_multianno.vcf ${meta}_annovar_multianno.txt
Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ shell = ['/bin/bash', '-euo', 'pipefail']
// default run resource parameters
process {

errorStrategy="retry"
withName: 'STRELKA_SOMATIC' {
cpus = 8
memory = 20.GB
Expand Down

0 comments on commit 3c0e53f

Please sign in to comment.