-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial Commit * Script file * strand option tests * -bed option test * distance option test * all test implemented * Update CHANGELOG.md * Update config.vsh.yaml * adding more links * exit on error * suggested changes * working on suggested changes --------- Co-authored-by: Jakub Majercik <[email protected]>
- Loading branch information
1 parent
c4ea23a
commit 2d0a990
Showing
6 changed files
with
503 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
name: bedtools_merge | ||
namespace: bedtools | ||
description: | | ||
Merges overlapping BED/GFF/VCF entries into a single interval. | ||
links: | ||
documentation: https://bedtools.readthedocs.io/en/latest/content/tools/merge.html | ||
repository: https://github.com/arq5x/bedtools2 | ||
homepage: https://bedtools.readthedocs.io/en/latest/# | ||
issue_tracker: https://github.com/arq5x/bedtools2/issues | ||
references: | ||
doi: 10.1093/bioinformatics/btq033 | ||
license: MIT | ||
requirements: | ||
commands: [bedtools] | ||
authors: | ||
- __merge__: /src/_authors/theodoro_gasperin.yaml | ||
roles: [ author, maintainer ] | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- name: --input | ||
alternatives: -i | ||
type: file | ||
description: Input file (BED/GFF/VCF) to be merged. | ||
required: true | ||
|
||
- name: Outputs | ||
arguments: | ||
- name: --output | ||
type: file | ||
direction: output | ||
description: Output merged file BED to be written. | ||
required: true | ||
|
||
- name: Options | ||
arguments: | ||
- name: --strand | ||
alternatives: -s | ||
type: boolean_true | ||
description: | | ||
Force strandedness. That is, only merge features | ||
that are on the same strand. | ||
- By default, merging is done without respect to strand. | ||
- name: --specific_strand | ||
alternatives: -S | ||
type: string | ||
choices: ["+", "-"] | ||
description: | | ||
Force merge for one specific strand only. | ||
Follow with + or - to force merge from only | ||
the forward or reverse strand, respectively. | ||
- By default, merging is done without respect to strand. | ||
- name: --distance | ||
alternatives: -d | ||
type: integer | ||
description: | | ||
Maximum distance between features allowed for features | ||
to be merged. | ||
- Def. 0. That is, overlapping & book-ended features are merged. | ||
- (INTEGER) | ||
- Note: negative values enforce the number of b.p. required for overlap. | ||
- name: --columns | ||
alternatives: -c | ||
type: integer | ||
description: | | ||
Specify columns from the B file to map onto intervals in A. | ||
Default: 5. | ||
Multiple columns can be specified in a comma-delimited list. | ||
- name: --operation | ||
alternatives: -o | ||
type: string | ||
description: | | ||
Specify the operation that should be applied to -c. | ||
Valid operations: | ||
sum, min, max, absmin, absmax, | ||
mean, median, mode, antimode | ||
stdev, sstdev | ||
collapse (i.e., print a delimited list (duplicates allowed)), | ||
distinct (i.e., print a delimited list (NO duplicates allowed)), | ||
distinct_sort_num (as distinct, sorted numerically, ascending), | ||
distinct_sort_num_desc (as distinct, sorted numerically, desscending), | ||
distinct_only (delimited list of only unique values), | ||
count | ||
count_distinct (i.e., a count of the unique values in the column), | ||
first (i.e., just the first value in the column), | ||
last (i.e., just the last value in the column), | ||
Default: sum | ||
Multiple operations can be specified in a comma-delimited list. | ||
If there is only column, but multiple operations, all operations will be | ||
applied on that column. Likewise, if there is only one operation, but | ||
multiple columns, that operation will be applied to all columns. | ||
Otherwise, the number of columns must match the the number of operations, | ||
and will be applied in respective order. | ||
E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5, | ||
the mean of column 4, and the count of column 6. | ||
The order of output columns will match the ordering given in the command. | ||
- name: --delimiter | ||
alternatives: -delim | ||
type: string | ||
description: | | ||
Specify a custom delimiter for the collapse operations. | ||
example: "|" | ||
default: "," | ||
|
||
- name: --precision | ||
alternatives: -prec | ||
type: integer | ||
description: | | ||
Sets the decimal precision for output (Default: 5). | ||
- name: --bed | ||
type: boolean_true | ||
description: | | ||
If using BAM input, write output as BED. | ||
- name: --header | ||
type: boolean_true | ||
description: | | ||
Print the header from the A file prior to results. | ||
- name: --no_buffer | ||
alternatives: -nobuf | ||
type: boolean_true | ||
description: | | ||
Disable buffered output. Using this option will cause each line | ||
of output to be printed as it is generated, rather than saved | ||
in a buffer. This will make printing large output files | ||
noticeably slower, but can be useful in conjunction with | ||
other software tools and scripts that need to process one | ||
line of bedtools output at a time. | ||
resources: | ||
- type: bash_script | ||
path: script.sh | ||
|
||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- path: test_data | ||
|
||
engines: | ||
- type: docker | ||
image: debian:stable-slim | ||
setup: | ||
- type: apt | ||
packages: [bedtools, procps] | ||
- type: docker | ||
run: | | ||
echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
```bash | ||
bedtools merge | ||
``` | ||
|
||
Tool: bedtools merge (aka mergeBed) | ||
Version: v2.30.0 | ||
Summary: Merges overlapping BED/GFF/VCF entries into a single interval. | ||
|
||
Usage: bedtools merge [OPTIONS] -i <bed/gff/vcf> | ||
|
||
Options: | ||
-s Force strandedness. That is, only merge features | ||
that are on the same strand. | ||
- By default, merging is done without respect to strand. | ||
|
||
-S Force merge for one specific strand only. | ||
Follow with + or - to force merge from only | ||
the forward or reverse strand, respectively. | ||
- By default, merging is done without respect to strand. | ||
|
||
-d Maximum distance between features allowed for features | ||
to be merged. | ||
- Def. 0. That is, overlapping & book-ended features are merged. | ||
- (INTEGER) | ||
- Note: negative values enforce the number of b.p. required for overlap. | ||
|
||
-c Specify columns from the B file to map onto intervals in A. | ||
Default: 5. | ||
Multiple columns can be specified in a comma-delimited list. | ||
|
||
-o Specify the operation that should be applied to -c. | ||
Valid operations: | ||
sum, min, max, absmin, absmax, | ||
mean, median, mode, antimode | ||
stdev, sstdev | ||
collapse (i.e., print a delimited list (duplicates allowed)), | ||
distinct (i.e., print a delimited list (NO duplicates allowed)), | ||
distinct_sort_num (as distinct, sorted numerically, ascending), | ||
distinct_sort_num_desc (as distinct, sorted numerically, desscending), | ||
distinct_only (delimited list of only unique values), | ||
count | ||
count_distinct (i.e., a count of the unique values in the column), | ||
first (i.e., just the first value in the column), | ||
last (i.e., just the last value in the column), | ||
Default: sum | ||
Multiple operations can be specified in a comma-delimited list. | ||
|
||
If there is only column, but multiple operations, all operations will be | ||
applied on that column. Likewise, if there is only one operation, but | ||
multiple columns, that operation will be applied to all columns. | ||
Otherwise, the number of columns must match the the number of operations, | ||
and will be applied in respective order. | ||
E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5, | ||
the mean of column 4, and the count of column 6. | ||
The order of output columns will match the ordering given in the command. | ||
|
||
|
||
-delim Specify a custom delimiter for the collapse operations. | ||
- Example: -delim "|" | ||
- Default: ",". | ||
|
||
-prec Sets the decimal precision for output (Default: 5) | ||
|
||
-bed If using BAM input, write output as BED. | ||
|
||
-header Print the header from the A file prior to results. | ||
|
||
-nobuf Disable buffered output. Using this option will cause each line | ||
of output to be printed as it is generated, rather than saved | ||
in a buffer. This will make printing large output files | ||
noticeably slower, but can be useful in conjunction with | ||
other software tools and scripts that need to process one | ||
line of bedtools output at a time. | ||
|
||
-iobuf Specify amount of memory to use for input buffer. | ||
Takes an integer argument. Optional suffixes K/M/G supported. | ||
Note: currently has no effect with compressed files. | ||
|
||
Notes: | ||
(1) The input file (-i) file must be sorted by chrom, then start. | ||
|
||
|
||
|
||
|
||
***** ERROR: No input file given. Exiting. ***** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#!/bin/bash | ||
|
||
## VIASH START | ||
## VIASH END | ||
|
||
# Exit on error | ||
set -eo pipefail | ||
|
||
# Unset parameters | ||
unset_if_false=( | ||
par_strand | ||
par_bed | ||
par_header | ||
par_no_buffer | ||
) | ||
|
||
for par in ${unset_if_false[@]}; do | ||
test_val="${!par}" | ||
[[ "$test_val" == "false" ]] && unset $par | ||
done | ||
|
||
# Execute bedtools merge with the provided arguments | ||
bedtools merge \ | ||
${par_strand:+-s} \ | ||
${par_specific_strand:+-S "$par_specific_strand"} \ | ||
${par_bed:+-bed} \ | ||
${par_header:+-header} \ | ||
${par_no_buffer:+-nobuf} \ | ||
${par_distance:+-d "$par_distance"} \ | ||
${par_columns:+-c "$par_columns"} \ | ||
${par_operation:+-o "$par_operation"} \ | ||
${par_delimiter:+-delim "$par_delimiter"} \ | ||
${par_precision:+-prec "$par_precision"} \ | ||
-i "$par_input" \ | ||
> "$par_output" |
Oops, something went wrong.