-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
updating readme, fixing genotypes, adding annovar
- Loading branch information
Showing
4 changed files
with
83 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,73 @@ | ||
# somatic_point_mutations | ||
call somatic point mutations from tumor/normal pairs | ||
# Somatic Point Mutations | ||
|
||
## nextflow | ||
``` | ||
nextflow run digenoma-lab/somatic_point_mutations -r v1.0 --tn test.csv -params-file strelka-params.yml -profile kutral | ||
``` | ||
This repository provides a Nextflow pipeline for calling somatic point mutations from tumor/normal pairs using Whole Genome Sequencing (WGS) or Exome data. | ||
|
||
## Getting Started | ||
|
||
### Tumor/Normal file | ||
### Running the Pipeline | ||
|
||
CSV file indicating the paths to CRAM or BAM files, including index and alternative manta_indel VCFs files. | ||
To run the pipeline, use the following command: | ||
|
||
```sh | ||
nextflow run digenoma-lab/somatic_point_mutations -r v1.1 --tn test.csv -params-file strelka-params.yml -profile kutral | ||
``` | ||
|
||
### Input File: Tumor/Normal Pairs | ||
|
||
Prepare a CSV file indicating the paths to CRAM or BAM files, including index and optional manta_indel VCF files. The CSV file should follow this format: | ||
|
||
```csv | ||
sampleId,normal,normal_index,tumor,tumor_index,manta_indel,manta_indel_index | ||
A,A.cram,A.cram.crai,AT.cram,AT.cram.crai,, | ||
B,B.cram,B.cram.crai,BT.cram,BT.cram.crai,, | ||
C,C.cram,C.cram.crai,CT.cram,CT.cram.crai,, | ||
D,D.cram,D.cram.crai,DT.cram,DT.cram.crai,DT.manta.vcf.gz,DT.manta.vcf.gz.tbi | ||
``` | ||
|
||
### Available Options | ||
## Pipeline Options | ||
|
||
The `somatic_point_mutations` pipeline has several required and optional arguments. | ||
|
||
### Required Arguments | ||
|
||
- `--tn`: CSV file with tumor/normal pairs. | ||
- `--fasta`: Reference genome file in FASTA format. | ||
- `--fai`: Reference genome index file in FAI format. | ||
|
||
### Optional Arguments | ||
|
||
- `--outdir`: Directory for Nextflow results. Default: `./results`. | ||
- `--exome`: Set if the data is Exome rather than WGS. Default: `false`. | ||
- `--target_bed`: Target regions for Strelka in BED format for hg38. Default: `/somatic_point_mutations/auxfiles/hg38.bed.gz`. | ||
- `--target_bed_index`: Index for target BED regions. Default: `/somatic_point_mutations/auxfiles/hg38.bed.gz.tbi`. | ||
|
||
### Annovar Options | ||
|
||
- `--annovar_bin`: Path to `annovar_table.pl` executable. Default: `/annovar/annovar/table_annovar.pl`. | ||
- `--annovar_bd`: Path to Annovar database for hg38. Default: `/databases/annovar/hg38`. | ||
- `--annovar_protocol`: Databases included in Annovar analysis. Default: `ensGene,clinvar_20220320,revel,dbnsfp42c,gnomad30_genome,avsnp150,icgc28`. | ||
- `--annovar_operation`: Operations according to Annovar selected databases. Default: `g,f,f,f,f,f,f`. | ||
|
||
## Example Usage | ||
|
||
```sh | ||
nextflow run digenoma-lab/somatic_point_mutations -r v1.1 \ | ||
--tn test.csv \ | ||
--fasta /path/to/reference.fasta \ | ||
--fai /path/to/reference.fasta.fai \ | ||
--outdir ./results \ | ||
--exome true \ | ||
--target_bed /path/to/target.bed.gz \ | ||
--target_bed_index /path/to/target.bed.gz.tbi \ | ||
--annovar_bin /path/to/annovar/table_annovar.pl \ | ||
--annovar_bd /path/to/annovar/hg38 \ | ||
--annovar_protocol ensGene,clinvar_20220320,revel,dbnsfp42c,gnomad30_genome,avsnp150,icgc28 \ | ||
--annovar_operation g,f,f,f,f,f,f | ||
-profile kutral | ||
``` | ||
somatic_point_mutations.nf: somatic point mutation caller pipeline. | ||
Required arguments: | ||
--tn tumor normal pairs | ||
[default: false] | ||
--fasta reference file in fasta format | ||
[default: false] | ||
--fai reference index file in fai format | ||
[default: false] | ||
Optional arguments: | ||
--outdir The NextFlow result directory. | ||
[default: ./results] | ||
--exome Data is Exome rather than WGS. | ||
[default: false] | ||
--target_bed target region for strelka in bed format hg38 | ||
[default: /somatic_point_mutations/auxfiles/hg38.bed.gz] | ||
--target_bed_index bed index for target regions | ||
[default: /somatic_point_mutations/auxfiles/hg38.bed.gz.tbi] | ||
``` | ||
|
||
## Support | ||
|
||
If you encounter any issues or have questions, please open an issue on the [GitHub repository](https://github.com/digenoma-lab/somatic_point_mutations). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#!/bin/bash | ||
|
||
for vcf in "$@" | ||
do | ||
|
||
out=${vcf/vcf.gz/vcf} | ||
|
||
first_format_num=$(zgrep -n -m 1 '##FORMAT' "$vcf" | cut -d : -f 1) | ||
zcat "$vcf" | sed "$first_format_num"'i##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' > "$out" | ||
sed -ri 's|(DP:)|GT:\1|' "$out" | ||
sed -ri 's|(:TU\t)|\10/0:|g' "$out" | ||
sed -ri 's|(:TU\t[^\t]*\t)|\10/1:|g' "$out" | ||
sed -ri 's|(:BCN50\t)|\10/0:|g' "$out" | ||
sed -ri 's|(:BCN50\t[^\t]*\t)|\10/1:|g' "$out" | ||
gzip -f $out | ||
|
||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters