-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* WIP * update arguments in config * add script code * Add test * update CHANGELOG * add requirements * Apply suggestions from code review Co-authored-by: Robrecht Cannoodt <[email protected]> * Update with review comments * update test_data * update outputs * update with latest review * Apply suggestions from code review --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
- Loading branch information
1 parent
16d7fd3
commit f54af0a
Showing
8 changed files
with
359 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
functionality: | ||
name: pear | ||
description: | | ||
PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory. | ||
PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer. | ||
info: | ||
keywords: [ "pair-end", "read", "merge" ] | ||
homepage: https://cme.h-its.org/exelixis/web/software/pear | ||
repository: https://github.com/tseemann/PEAR | ||
documentation: https://cme.h-its.org/exelixis/web/software/pear/doc.html | ||
reference: doi:10.1093/bioinformatics/btt593 | ||
licence: "CC-BY-NC-SA-3.0" | ||
requirements: | ||
commands: [ pear , gzip ] | ||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --forward_fastq | ||
alternatives: -f | ||
type: file | ||
description: Forward paired-end FASTQ file | ||
required: true | ||
example: "forward.fastq" | ||
- name: --reverse_fastq | ||
alternatives: -r | ||
type: file | ||
description: Reverse paired-end FASTQ file | ||
required: true | ||
example: "reverse.fastq" | ||
- name: Outputs | ||
arguments: | ||
- name: --assembled | ||
type: file | ||
description: The output file containing assembled reads. Can be compressed with gzip. | ||
required: true | ||
direction: output | ||
- name: --unassembled_forward | ||
type: file | ||
description: The output file containing forward reads that could not be assembled. Can be compressed with gzip. | ||
required: true | ||
direction: output | ||
- name: --unassembled_reverse | ||
type: file | ||
description: The output file containing reverse reads that could not be assembled. Can be compressed with gzip. | ||
required: true | ||
direction: output | ||
- name: --discarded | ||
type: file | ||
description: The output file containing reads that were discarded due to too low quality or too many uncalled bases. Can be compressed with gzip. | ||
required: true | ||
direction: output | ||
- name: Arguments | ||
arguments: | ||
- name: --p_value | ||
alternatives: -p | ||
type: double | ||
description: | | ||
Specify a p-value for the statistical test. If the computed p-value of a possible assembly exceeds the specified p-value then paired-end read will not be assembled. Valid options are: 0.0001, 0.001, 0.01, 0.05 and 1.0. Setting 1.0 disables the test. | ||
example: 0.01 | ||
required: false | ||
- name: --min_overlap | ||
alternatives: -v | ||
type: integer | ||
description: | | ||
Specify the minimum overlap size. The minimum overlap may be set to 1 when the statistical test is used. However, further restricting the minimum overlap size to a proper value may reduce false-positive assembles. | ||
required: false | ||
example: 10 | ||
- name: --max_assembly_length | ||
alternatives: -m | ||
type: integer | ||
description: | | ||
Specify the maximum possible length of the assembled sequences. Setting this value to 0 disables the restriction and assembled sequences may be arbitrary long. | ||
required: false | ||
example: 0 | ||
- name: --min_assembly_length | ||
alternatives: -n | ||
type: integer | ||
description: | | ||
Specify the minimum possible length of the assembled sequences. Setting this value to 0 disables the restriction and assembled sequences may be arbitrary short. | ||
required: false | ||
example: 0 | ||
- name: --min_trim_length | ||
alternatives: -t | ||
type: integer | ||
description: | | ||
Specify the minimum length of reads after trimming the low quality part (see option -q) | ||
required: false | ||
example: 1 | ||
- name: --quality_threshold | ||
alternatives: -q | ||
type: integer | ||
description: | | ||
Specify the quality threshold for trimming the low quality part of a read. If the quality scores of two consecutive bases are strictly less than the specified threshold, the rest of the read will be trimmed. | ||
required: false | ||
example: 0 | ||
- name: --max_uncalled_base | ||
alternatives: -u | ||
type: double | ||
description: | | ||
Specify the maximal proportion of uncalled bases in a read. Setting this value to 0 will cause PEAR to discard all reads containing uncalled bases. The other extreme setting is 1 which causes PEAR to process all reads independent on the number of uncalled bases. | ||
example: 1.0 | ||
required: false | ||
- name: --test_method | ||
alternatives: -g | ||
type: integer | ||
description: | | ||
Specify the type of statistical test. Two options are available. 1: Given the minimum allowed overlap, test using the highest OES. Note that due to its discrete nature, this test usually yields a lower p-value for the assembled read than the cut- off (specified by -p). For example, setting the cut-off to 0.05 using this test, the assembled reads might have an actual p-value of 0.02. | ||
2. Use the acceptance probability (m.a.p). This test methods computes the same probability as test method 1. However, it assumes that the minimal overlap is the observed overlap with the highest OES, instead of the one specified by -v. Therefore, this is not a valid statistical test and the 'p-value' is in fact the maximal probability for accepting the assembly. Nevertheless, we observed in practice that for the case the actual overlap sizes are relatively small, test 2 can correctly assemble more reads with only slightly higher false-positive rate. | ||
required: false | ||
example: 1 | ||
- name: --emperical_freqs | ||
alternatives: -e | ||
type: boolean_true | ||
description: | | ||
Disable empirical base frequencies. | ||
- name: --score_method | ||
alternatives: -s | ||
type: integer | ||
description: | | ||
Specify the scoring method. 1. OES with +1 for match and -1 for mismatch. 2: Assembly score (AS). Use +1 for match and -1 for mismatch multiplied by base quality scores. 3: Ignore quality scores and use +1 for a match and -1 for a mismatch. | ||
required: false | ||
example: 2 | ||
- name: --phred_base | ||
alternatives: -b | ||
type: integer | ||
description: | | ||
Base PHRED quality score. | ||
required: false | ||
example: 33 | ||
- name: --cap | ||
alternatives: -c | ||
type: integer | ||
description: | | ||
Specify the upper bound for the resulting quality score. If set to zero, capping is disabled. | ||
required: false | ||
example: 40 | ||
- name: --nbase | ||
alternatives: -z | ||
type: boolean_true | ||
description: | | ||
When merging a base-pair that consists of two non-equal bases out of which none is degenerate, set the merged base to N and use the highest quality score of the two bases | ||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- type: file | ||
path: test_data | ||
platforms: | ||
- type: docker | ||
image: quay.io/biocontainers/pear:0.9.6--h9d449c0_10 | ||
setup: | ||
- type: docker | ||
run: | | ||
version=$(pear -h | grep 'PEAR v' | sed 's/PEAR v//' | sed 's/ .*//') && \ | ||
echo "pear: $version" > /var/software_versions.txt | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
```bash | ||
pear -h | ||
``` | ||
|
||
____ _____ _ ____ | ||
| _ \| ____| / \ | _ \ | ||
| |_) | _| / _ \ | |_) | | ||
| __/| |___ / ___ \| _ < | ||
|_| |_____/_/ \_\_| \_\ | ||
PEAR v0.9.6 [January 15, 2015] - [+bzlib +zlib] | ||
|
||
Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR | ||
Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593 | ||
|
||
License: Creative Commons Licence | ||
Bug-reports and requests to: [email protected] and [email protected] | ||
|
||
|
||
Usage: pear <options> | ||
Standard (mandatory): | ||
-f, --forward-fastq <str> Forward paired-end FASTQ file. | ||
-r, --reverse-fastq <str> Reverse paired-end FASTQ file. | ||
-o, --output <str> Output filename. | ||
Optional: | ||
-p, --p-value <float> Specify a p-value for the statistical test. If the computed | ||
p-value of a possible assembly exceeds the specified p-value | ||
then paired-end read will not be assembled. Valid options | ||
are: 0.0001, 0.001, 0.01, 0.05 and 1.0. Setting 1.0 disables | ||
the test. (default: 0.01) | ||
-v, --min-overlap <int> Specify the minimum overlap size. The minimum overlap may be | ||
set to 1 when the statistical test is used. However, further | ||
restricting the minimum overlap size to a proper value may | ||
reduce false-positive assembles. (default: 10) | ||
-m, --max-assembly-length <int> Specify the maximum possible length of the assembled | ||
sequences. Setting this value to 0 disables the restriction | ||
and assembled sequences may be arbitrary long. (default: 0) | ||
-n, --min-assembly-length <int> Specify the minimum possible length of the assembled | ||
sequences. Setting this value to 0 disables the restriction | ||
and assembled sequences may be arbitrary short. (default: | ||
50) | ||
-t, --min-trim-length <int> Specify the minimum length of reads after trimming the low | ||
quality part (see option -q). (default: 1) | ||
-q, --quality-threshold <int> Specify the quality score threshold for trimming the low | ||
quality part of a read. If the quality scores of two | ||
consecutive bases are strictly less than the specified | ||
threshold, the rest of the read will be trimmed. (default: | ||
0) | ||
-u, --max-uncalled-base <float> Specify the maximal proportion of uncalled bases in a read. | ||
Setting this value to 0 will cause PEAR to discard all reads | ||
containing uncalled bases. The other extreme setting is 1 | ||
which causes PEAR to process all reads independent on the | ||
number of uncalled bases. (default: 1) | ||
-g, --test-method <int> Specify the type of statistical test. Two options are | ||
available. (default: 1) | ||
1: Given the minimum allowed overlap, test using the highest | ||
OES. Note that due to its discrete nature, this test usually | ||
yields a lower p-value for the assembled read than the cut- | ||
off (specified by -p). For example, setting the cut-off to | ||
0.05 using this test, the assembled reads might have an | ||
actual p-value of 0.02. | ||
|
||
2. Use the acceptance probability (m.a.p). This test methods | ||
computes the same probability as test method 1. However, it | ||
assumes that the minimal overlap is the observed overlap | ||
with the highest OES, instead of the one specified by -v. | ||
Therefore, this is not a valid statistical test and the | ||
'p-value' is in fact the maximal probability for accepting | ||
the assembly. Nevertheless, we observed in practice that for | ||
the case the actual overlap sizes are relatively small, test | ||
2 can correctly assemble more reads with only slightly | ||
higher false-positive rate. | ||
-e, --empirical-freqs Disable empirical base frequencies. (default: use empirical | ||
base frequencies) | ||
-s, --score-method <int> Specify the scoring method. (default: 2) | ||
1. OES with +1 for match and -1 for mismatch. | ||
2: Assembly score (AS). Use +1 for match and -1 for mismatch | ||
multiplied by base quality scores. | ||
3: Ignore quality scores and use +1 for a match and -1 for a | ||
mismatch. | ||
-b, --phred-base <int> Base PHRED quality score. (default: 33) | ||
-y, --memory <str> Specify the amount of memory to be used. The number may be | ||
followed by one of the letters K, M, or G denoting | ||
Kilobytes, Megabytes and Gigabytes, respectively. Bytes are | ||
assumed in case no letter is specified. | ||
-c, --cap <int> Specify the upper bound for the resulting quality score. If | ||
set to zero, capping is disabled. (default: 40) | ||
-j, --threads <int> Number of threads to use | ||
-z, --nbase When merging a base-pair that consists of two non-equal | ||
bases out of which none is degenerate, set the merged base | ||
to N and use the highest quality score of the two bases | ||
-h, --help This help screen. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
#!/bin/bash | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
[[ "$par_emperical_freqs" == "false" ]] && unset par_emperical_freqs | ||
[[ "$par_nbase" == "false" ]] && unset par_nbase | ||
|
||
if [[ "${par_forward_fastq##*.}" == "gz" ]]; then | ||
gunzip $par_forward_fastq | ||
par_forward_fastq=${par_forward_fastq%.*} | ||
fi | ||
if [[ "${par_reverse_fastq##*.}" == "gz" ]]; then | ||
gunzip $par_reverse_fastq | ||
par_reverse_fastq=${par_reverse_fastq%.*} | ||
fi | ||
|
||
output_dir=$(mktemp -d -p "$meta_temp_dir" "pear.XXXXXX") | ||
|
||
pear \ | ||
-f "$par_forward_fastq" \ | ||
-r "$par_reverse_fastq" \ | ||
-o "$output_dir" \ | ||
${par_p_value:+-p "${par_p_value}"} \ | ||
${par_min_overlap:+-v "${par_min_overlap}"} \ | ||
${par_max_assembly_length:+-m "${par_max_assembly_length}"} \ | ||
${par_min_assembly_length:+-n "${par_min_assembly_length}"} \ | ||
${par_min_trim_length:+-t "${par_min_trim_length}"} \ | ||
${par_quality_threshold:+-q "${par_quality_threshold}"} \ | ||
${par_max_uncalled_base:+-u "${par_max_uncalled_base}"} \ | ||
${par_test_method:+-g "${par_test_method}"} \ | ||
${par_score_method:+-s "${par_score_method}"} \ | ||
${par_phred_base:+-b "${par_phred_base}"} \ | ||
${meta_memory_mb:+--memory "${meta_memory_mb}M"} \ | ||
${par_cap:+-c "${par_cap}"} \ | ||
${meta_cpus:+-j "${meta_cpus}"} \ | ||
${par_emperical_freqs:+-e} \ | ||
${par_nbase:+-z} | ||
|
||
|
||
if [[ "${par_assembled##*.}" == "gz" ]]; then | ||
gzip -9 -c ${output_dir}.assembled.fastq > ${par_assembled} | ||
else | ||
mv ${output_dir}.assembled.fastq ${par_assembled} | ||
fi | ||
|
||
if [[ "${par_unassembled_forward##*.}" == "gz" ]]; then | ||
gzip -9 -c ${output_dir}.unassembled.forward.fastq > ${par_unassembled_forward} | ||
else | ||
mv ${output_dir}.unassembled.forward.fastq ${par_unassembled_forward} | ||
fi | ||
|
||
if [[ "${par_unassembled_reverse##*.}" == "gz" ]]; then | ||
gzip -9 -c ${output_dir}.unassembled.reverse.fastq > ${par_unassembled_reverse} | ||
else | ||
mv ${output_dir}.unassembled.reverse.fastq ${par_unassembled_reverse} | ||
fi | ||
|
||
if [[ "${par_discarded##*.}" == "gz" ]]; then | ||
gzip -9 -c ${output_dir}.discarded.fastq > ${par_discarded} | ||
else | ||
mv ${output_dir}.discarded.fastq ${par_discarded} | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
|
||
set -e | ||
|
||
dir_in="${meta_resources_dir%/}/test_data" | ||
|
||
echo "> Run PEAR" | ||
"$meta_executable" \ | ||
--forward_fastq "$dir_in/a.1.fastq" \ | ||
--reverse_fastq "$dir_in/a.2.fastq" \ | ||
--assembled "test.assembled.fastq.gz" \ | ||
--unassembled_forward "test.unassembled.forward.fastq.gz" \ | ||
--unassembled_reverse "test.unassembled.reverse.fastq.gz" \ | ||
--discarded "test.discarded.fastq.gz" \ | ||
--p_value 0.01 | ||
|
||
echo ">> Checking output" | ||
[ ! -f "test.assembled.fastq.gz" ] && echo "Output file test.assembled.fastq.gz does not exist" && exit 1 | ||
[ ! -f "test.unassembled.forward.fastq.gz" ] && echo "Output file test.unassembled.forward.fastq.gz does not exist" && exit 1 | ||
[ ! -f "test.unassembled.reverse.fastq.gz" ] && echo "Output file test.unassembled.reverse.fastq.gz does not exist" && exit 1 | ||
[ ! -f "test.discarded.fastq.gz" ] && echo "Output file ftest.discarded.fastq.gz does not exist" && exit 1 | ||
|
||
echo "> Test successful" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
@1 | ||
ACGGCAT | ||
+ | ||
!!!!!!! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
@1 | ||
ACGGCAT | ||
+ | ||
!!!!!!! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# pear test data | ||
|
||
# Test data was obtained from https://github.com/snakemake/snakemake-wrappers/tree/master/bio/fastp/test | ||
|
||
if [ ! -d /tmp/snakemake-wrappers ]; then | ||
git clone --depth 1 --single-branch --branch master https://github.com/snakemake/snakemake-wrappers /tmp/snakemake-wrappers | ||
fi | ||
|
||
cp -r /tmp/snakemake-wrappers/bio/fastp/test/reads/pe/* src/pear/test_data | ||
|