-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add config and help for lofreq call * add script to call * add test data * add test to lofreq call * fix test error * update test lofreq call * Add config to indelqual comp * add indelqual script * WIP add indelqual test * fix typo * add namespace * update CHANGELOG * fix typo * rename output to out * Define defaults more explicitly * fefactor metadata to latest update * replace lofreq test data * remove comment about stdout since output file is required --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
- Loading branch information
1 parent
ee70c2c
commit 5a1601f
Showing
17 changed files
with
623 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,251 @@ | ||
functionality: | ||
name: lofreq_call | ||
namespace: lofreq | ||
description: | | ||
Call variants from a BAM file. | ||
LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering. | ||
LoFreq* can run on almost any type of aligned sequencing data (e.g. Illumina, IonTorrent or Pacbio) since no machine- or sequencing-technology dependent thresholds are used. It automatically adapts to changes in coverage and sequencing quality and can therefore be applied to a variety of data-sets e.g. viral/quasispecies, bacterial, metagenomics or somatic data. | ||
LoFreq* is very sensitive; most notably, it is able to predict variants below the average base-call quality (i.e. sequencing error rate). Each variant call is assigned a p-value which allows for rigorous false positive control. Even though it uses no approximations or heuristics, it is very efficient due to several runtime optimizations and also provides a (pseudo-)parallel implementation. LoFreq* is generic and fast enough to be applied to high-coverage data and large genomes. On a single processor it takes a minute to analyze Dengue genome sequencing data with nearly 4000X coverage, roughly one hour to call SNVs on a 600X coverage E.coli genome and also roughly an hour to run on a 100X coverage human exome dataset. | ||
info: | ||
keywords: [ "variant calling", "low frequancy variant calling", "lofreq", "lofreq/call"] | ||
links: | ||
homepage: https://csb5.github.io/lofreq/ | ||
documentation: https://csb5.github.io/lofreq/commands/ | ||
reference: | ||
doi: 10.1093/nar/gks918 | ||
license: "MIT" | ||
requirements: | ||
commands: [ lofreq ] | ||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --input | ||
type: file | ||
description: | | ||
Input BAM file. | ||
required: true | ||
example: "normal.bam" | ||
- name: --input_bai | ||
type: file | ||
description: | | ||
Index file for the input BAM file. | ||
required: true | ||
example: "normal.bai" | ||
- name: --ref | ||
alternatives: -f | ||
type: file | ||
description: | | ||
Indexed reference fasta file (gzip supported). Default: none. | ||
required: true | ||
example: "reference.fasta" | ||
- name: Outputs | ||
arguments: | ||
- name: --out | ||
alternatives: -o | ||
type: file | ||
description: | | ||
Vcf output file. Default: stdout. | ||
required: true | ||
direction: output | ||
example: "output.vcf" | ||
- name: Arguments | ||
arguments: | ||
- name: --region | ||
alternatives: -r | ||
type: string | ||
description: | | ||
Limit calls to this region (chrom:start-end). Default: none. | ||
required: false | ||
example: "chr1:1000-2000" | ||
- name: --bed | ||
alternatives: -l | ||
type: file | ||
description: | | ||
List of positions (chr pos) or regions (BED). Default: none. | ||
required: false | ||
example: "regions.bed" | ||
- name: --min_bq | ||
alternatives: -q | ||
type: integer | ||
description: | | ||
Skip any base with baseQ smaller than INT. Default: 6. | ||
required: false | ||
example: 6 | ||
- name: --min_alt_bq | ||
alternatives: -Q | ||
type: integer | ||
description: | | ||
Skip alternate bases with baseQ smaller than INT. Default: 6. | ||
required: false | ||
example: 6 | ||
- name: --def_alt_bq | ||
alternatives: -R | ||
type: integer | ||
description: | | ||
Overwrite baseQs of alternate bases (that passed bq filter) with this value (-1: use median ref-bq; 0: keep). Default: 0. | ||
required: false | ||
example: 0 | ||
- name: --min_jq | ||
alternatives: -j | ||
type: integer | ||
description: | | ||
Skip any base with joinedQ smaller than INT. Default: 0. | ||
example: 0 | ||
- name: --min_alt_jq | ||
alternatives: -J | ||
type: integer | ||
description: | | ||
Skip alternate bases with joinedQ smaller than INT. Default: 0. | ||
required: false | ||
example: 0 | ||
- name: --def_alt_jq | ||
alternatives: -K | ||
type: integer | ||
description: | | ||
Overwrite joinedQs of alternate bases (that passed jq filter) with this value (-1: use median ref-bq; 0: keep). Default: 0. | ||
required: false | ||
example: 0 | ||
- name: --no_baq | ||
alternatives: -B | ||
type: boolean_true | ||
description: | | ||
Disable use of base-alignment quality (BAQ). | ||
- name: --no_idaq | ||
alternatives: -A | ||
type: boolean_true | ||
description: | | ||
Don't use IDAQ values (NOT recommended under ANY circumstances other than debugging). | ||
- name: --del_baq | ||
alternatives: -D | ||
type: boolean_true | ||
description: | | ||
Delete pre-existing BAQ values, i.e. compute even if already present in BAM. | ||
- name: --no_ext_baq | ||
alternatives: -e | ||
type: boolean_true | ||
description: | | ||
Use 'normal' BAQ (samtools default) instead of extended BAQ (both computed on the fly if not already present in lb tag). | ||
- name: --min_mq | ||
alternatives: -m | ||
type: integer | ||
description: | | ||
Skip reads with mapping quality smaller than INT. Default: 0. | ||
required: false | ||
example: 0 | ||
- name: --max_mq | ||
alternatives: -M | ||
type: integer | ||
description: | | ||
Cap mapping quality at INT. Default: 255. | ||
required: false | ||
example: 255 | ||
- name: --no_mq | ||
alternatives: -N | ||
type: boolean_true | ||
description: | | ||
Don't merge mapping quality in LoFreq's model. | ||
- name: --call_indels | ||
type: boolean_true | ||
description: | | ||
Enable indel calls (note: preprocess your file to include indel alignment qualities!). | ||
- name: --only_indels | ||
type: boolean_true | ||
description: | | ||
Only call indels; no SNVs. | ||
- name: --src_qual | ||
alternatives: -s | ||
type: boolean_true | ||
description: | | ||
Enable computation of source quality. | ||
- name: --ign_vcf | ||
alternatives: -S | ||
type: file | ||
description: | | ||
Ignore variants in this vcf file for source quality computation. Multiple files can be given separated by commas. | ||
required: false | ||
example: "variants.vcf" | ||
- name: --def_nm_q | ||
alternatives: -T | ||
type: integer | ||
description: | | ||
If >= 0, then replace non-match base qualities with this default value. Default: -1. | ||
required: false | ||
example: -1 | ||
- name: --sig | ||
alternatives: -a | ||
type: double | ||
description: | | ||
P-Value cutoff / significance level. Default: 0.010000. | ||
required: false | ||
example: 0.01 | ||
- name: --bonf | ||
alternatives: -b | ||
type: string | ||
description: | | ||
Bonferroni factor. 'dynamic' (increase per actually performed test) or INT. Default: Dynamic. | ||
required: false | ||
example: "dynamic" | ||
- name: --min_cov | ||
alternatives: -C | ||
type: integer | ||
description: | | ||
Test only positions having at least this coverage. Default: 1. | ||
(note: without --no-default-filter default filters (incl. coverage) kick in after predictions are done). | ||
required: false | ||
example: 1 | ||
- name: --max_depth | ||
alternatives: -d | ||
type: integer | ||
description: | | ||
Cap coverage at this depth. Default: 1000000. | ||
required: false | ||
example: 1000000 | ||
- name: --illumina_13 | ||
type: boolean_true | ||
description: | | ||
Assume the quality is Illumina-1.3-1.7/ASCII+64 encoded. | ||
- name: --use_orphan | ||
type: boolean_true | ||
description: | | ||
Count anomalous read pairs (i.e. where mate is not aligned properly). | ||
- name: --plp_summary_only | ||
type: boolean_true | ||
description: | | ||
No variant calling. Just output pileup summary per column. | ||
- name: --no_default_filter | ||
type: boolean_true | ||
description: | | ||
Don't run default 'lofreq filter' automatically after calling variants. | ||
- name: --force_overwrite | ||
type: boolean_true | ||
description: | | ||
Overwrite any existing output. | ||
- name: --verbose | ||
type: boolean_true | ||
description: | | ||
Be verbose. | ||
- name: --debug | ||
type: boolean_true | ||
description: | | ||
Enable debugging. | ||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- type: file | ||
path: test_data | ||
platforms: | ||
- type: docker | ||
image: quay.io/biocontainers/lofreq:2.1.5--py38h794fc9e_10 | ||
setup: | ||
- type: docker | ||
run: | | ||
version=$(lofreq version | grep 'version' | sed 's/version: //') && \ | ||
echo "lofreq: $version" > /var/software_versions.txt | ||
- type: nextflow | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
lofreq call: call variants from BAM file | ||
|
||
Usage: lofreq call [options] in.bam | ||
|
||
Options: | ||
- Reference: | ||
-f | --ref FILE Indexed reference fasta file (gzip supported) [null] | ||
- Output: | ||
-o | --out FILE Vcf output file [- = stdout] | ||
- Regions: | ||
-r | --region STR Limit calls to this region (chrom:start-end) [null] | ||
-l | --bed FILE List of positions (chr pos) or regions (BED) [null] | ||
- Base-call quality: | ||
-q | --min-bq INT Skip any base with baseQ smaller than INT [6] | ||
-Q | --min-alt-bq INT Skip alternate bases with baseQ smaller than INT [6] | ||
-R | --def-alt-bq INT Overwrite baseQs of alternate bases (that passed bq filter) with this value (-1: use median ref-bq; 0: keep) [0] | ||
-j | --min-jq INT Skip any base with joinedQ smaller than INT [0] | ||
-J | --min-alt-jq INT Skip alternate bases with joinedQ smaller than INT [0] | ||
-K | --def-alt-jq INT Overwrite joinedQs of alternate bases (that passed jq filter) with this value (-1: use median ref-bq; 0: keep) [0] | ||
- Base-alignment (BAQ) and indel-aligment (IDAQ) qualities: | ||
-B | --no-baq Disable use of base-alignment quality (BAQ) | ||
-A | --no-idaq Don't use IDAQ values (NOT recommended under ANY circumstances other than debugging) | ||
-D | --del-baq Delete pre-existing BAQ values, i.e. compute even if already present in BAM | ||
-e | --no-ext-baq Use 'normal' BAQ (samtools default) instead of extended BAQ (both computed on the fly if not already present in lb tag) | ||
- Mapping quality: | ||
-m | --min-mq INT Skip reads with mapping quality smaller than INT [0] | ||
-M | --max-mq INT Cap mapping quality at INT [255] | ||
-N | --no-mq Don't merge mapping quality in LoFreq's model | ||
- Indels: | ||
--call-indels Enable indel calls (note: preprocess your file to include indel alignment qualities!) | ||
--only-indels Only call indels; no SNVs | ||
- Source quality: | ||
-s | --src-qual Enable computation of source quality | ||
-S | --ign-vcf FILE Ignore variants in this vcf file for source quality computation. Multiple files can be given separated by commas | ||
-T | --def-nm-q INT If >= 0, then replace non-match base qualities with this default value [-1] | ||
- P-values: | ||
-a | --sig P-Value cutoff / significance level [0.010000] | ||
-b | --bonf Bonferroni factor. 'dynamic' (increase per actually performed test) or INT ['dynamic'] | ||
- Misc.: | ||
-C | --min-cov INT Test only positions having at least this coverage [1] | ||
(note: without --no-default-filter default filters (incl. coverage) kick in after predictions are done) | ||
-d | --max-depth INT Cap coverage at this depth [1000000] | ||
--illumina-1.3 Assume the quality is Illumina-1.3-1.7/ASCII+64 encoded | ||
--use-orphan Count anomalous read pairs (i.e. where mate is not aligned properly) | ||
--plp-summary-only No variant calling. Just output pileup summary per column | ||
--no-default-filter Don't run default 'lofreq filter' automatically after calling variants | ||
--force-overwrite Overwrite any existing output | ||
--verbose Be verbose | ||
--debug Enable debugging |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
#!/bin/bash | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
# Unset all parameters that are set to "false" | ||
[[ "$par_no_baq" == "false" ]] && unset par_no_baq | ||
[[ "$par_no_idaq" == "false" ]] && unset par_no_idaq | ||
[[ "$par_del_baq" == "false" ]] && unset par_del_baq | ||
[[ "$par_no_ext_baq" == "false" ]] && unset par_no_ext_baq | ||
[[ "$par_no_mq" == "false" ]] && unset par_no_mq | ||
[[ "$par_call_indels" == "false" ]] && unset par_call_indels | ||
[[ "$par_only_indels" == "false" ]] && unset par_only_indels | ||
[[ "$par_src_qual" == "false" ]] && unset par_src_qual | ||
[[ "$par_illumina_13" == "false" ]] && unset par_illumina_13 | ||
[[ "$par_use_orphan" == "false" ]] && unset par_use_orphan | ||
[[ "$par_plp_summary_only" == "false" ]] && unset par_plp_summary_only | ||
[[ "$par_no_default_filter" == "false" ]] && unset par_no_default_filter | ||
[[ "$par_force_overwrite" == "false" ]] && unset par_force_overwrite | ||
[[ "$par_verbose" == "false" ]] && unset par_verbose | ||
[[ "$par_debug" == "false" ]] && unset par_debug | ||
|
||
# Run lofreq call | ||
lofreq call \ | ||
-f "$par_ref" \ | ||
-o "$par_out" \ | ||
${par_region:+-r "${par_region}"} \ | ||
${par_bed:+-l "${par_bed}"} \ | ||
${par_min_bq:+-q "${par_min_bq}"} \ | ||
${par_min_alt_bq:+-Q "${par_min_alt_bq}"} \ | ||
${par_def_alt_bq:+-R "${par_def_alt_bq}"} \ | ||
${par_min_jq:+-j "${par_min_jq}"} \ | ||
${par_alt_jq:+-K "${par_alt_jq}"} \ | ||
${par_no_baq:+-B} \ | ||
${par_no_idaq:+-A} \ | ||
${par_del_baq:+-D} \ | ||
${par_no_ext_baq:+-e} \ | ||
${par_min_mq:+-m "${par_min_mq}"} \ | ||
${par_max_mq:+-M "${par_max_mq}"} \ | ||
${par_no_mq:+-N} \ | ||
${par_call_indels:+--call-indels} \ | ||
${par_only_indels:+--only-indels} \ | ||
${par_src_qual:+-s} \ | ||
${par_ign_vcf:+-S "${par_ign_vcf}"} \ | ||
${par_def_nm_q:+-T "${par_def_nm_q}"} \ | ||
${par_sig:+-a "${par_sig}"} \ | ||
${par_bonf:+-b "${par_bonf}"} \ | ||
${par_min_cov:+-C "${par_min_cov}"} \ | ||
${par_max_depth:+-d "${par_max_depth}"} \ | ||
${par_illumina_13:+--illumina-1.3} \ | ||
${par_use_orphan:+--use-orphan} \ | ||
${par_plp_summary_only:+--plp-summary-only} \ | ||
${par_no_default_filter:+--no-default-filter} \ | ||
${par_force_overwrite:+--force-overwrite} \ | ||
${par_verbose:+--verbose} \ | ||
${par_debug:+--debug} \ | ||
"$par_input" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/bash | ||
|
||
set -e | ||
|
||
dir_in="${meta_resources_dir%/}/test_data" | ||
|
||
echo "> Run lofreq call" | ||
"$meta_executable" \ | ||
--input "$dir_in/a.bam" \ | ||
--input_bai "$dir_in/a.bai" \ | ||
--ref "$dir_in/genome.fasta" \ | ||
--out "output.vcf" \ | ||
|
||
echo ">> Checking output" | ||
[ ! -f "output.vcf" ] && echo "Output file output.vcf does not exist" && exit 1 | ||
|
||
echo ">> Check if output is empty" | ||
[ ! -s "output.vcf" ] && echo "Output file output.vcf is empty" && exit 1 | ||
|
||
echo "> Test successful" |
Binary file not shown.
Binary file not shown.
Oops, something went wrong.