Skip to content

Commit

Permalink
Merge branch 'main' into bcftools_sort
Browse files Browse the repository at this point in the history
  • Loading branch information
jakubmajercik authored Sep 2, 2024
2 parents b553f79 + f3e87e5 commit 508821d
Show file tree
Hide file tree
Showing 18 changed files with 1,396 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,12 @@
* `bedtools`:
- `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
- `bedtools/bedtools_groupby`: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).
- `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
- `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
- `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
- `bedtools/bedtools_bed12tobed6`: Converts BED12 files to BED6 files (PR #140).
- `bedtools/bedtools_links`: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).

* `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74).

Expand Down
67 changes: 67 additions & 0 deletions src/bedtools/bedtools_bed12tobed6/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: bedtools_bed12tobed6
namespace: bedtools
description: |
Converts BED features in BED12 (a.k.a. “blocked” BED features such as genes) to discrete BED6 features.
For example, in the case of a gene with six exons, bed12ToBed6 would create six separate BED6 features (i.e., one for each exon).
keywords: [Converts, BED12, BED6]
links:
documentation: https://bedtools.readthedocs.io/en/latest/content/tools/bed12tobed6.html
repository: https://github.com/arq5x/bedtools2
homepage: https://bedtools.readthedocs.io/en/latest/#
issue_tracker: https://github.com/arq5x/bedtools2/issues
references:
doi: 10.1093/bioinformatics/btq033
license: MIT
requirements:
commands: [bedtools]
authors:
- __merge__: /src/_authors/theodoro_gasperin.yaml
roles: [ author, maintainer ]

argument_groups:

- name: Inputs
arguments:
- name: --input
alternatives: -i
type: file
description: Input BED12 file.
required: true

- name: Outputs
arguments:
- name: --output
alternatives: -o
type: file
direction: output
description: Output BED6 file to be written.

- name: Options
arguments:
- name: --n_score
alternatives: -n
type: boolean_true
description: |
Force the score to be the (1-based) block number from the BED12.
resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh

engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages: [bedtools, procps]
- type: docker
run: |
echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
13 changes: 13 additions & 0 deletions src/bedtools/bedtools_bed12tobed6/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
```
bedtools bed12tobed6 -h
```

Tool: bedtools bed12tobed6 (aka bed12ToBed6)
Version: v2.30.0
Summary: Splits BED12 features into discrete BED6 features.

Usage: bedtools bed12tobed6 [OPTIONS] -i <bed12>

Options:
-n Force the score to be the (1-based) block number from the BED12.

15 changes: 15 additions & 0 deletions src/bedtools/bedtools_bed12tobed6/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash

## VIASH START
## VIASH END

set -eo pipefail

# Unset parameters
[[ "$par_n_score" == "false" ]] && unset par_n_score

# Execute bedtools bed12tobed6 conversion
bedtools bed12tobed6 \
${par_n_score:+-n} \
-i "$par_input" \
> "$par_output"
85 changes: 85 additions & 0 deletions src/bedtools/bedtools_bed12tobed6/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/bin/bash

# exit on error
set -eo pipefail

#############################################
# helper functions
assert_file_exists() {
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
}
assert_file_not_empty() {
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
}
assert_file_contains() {
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
}
assert_identical_content() {
diff -a "$2" "$1" \
|| (echo "Files are not identical!" && exit 1)
}
#############################################

# Create directories for tests
echo "Creating Test Data..."
TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT

# Create example BED12 file
cat <<EOF > "$TMPDIR/example.bed12"
chr21 10079666 10120808 uc002yiv.1 0 - 10081686 1 0 1 2 0 6 0 8 0 4 528,91,101,215, 0,1930,39750,40927,
chr21 10080031 10081687 uc002yiw.1 0 - 10080031 1 0 0 8 0 0 3 1 0 2 200,91, 0,1565,
chr21 10081660 10120796 uc002yix.2 0 - 10081660 1 0 0 8 1 6 6 0 0 3 27,101,223, 0,37756,38913,
EOF

# Expected output bed6 file
cat <<EOF > "$TMPDIR/expected.bed6"
chr21 10079666 10120808 uc002yiv.1 0 -
chr21 10080031 10081687 uc002yiw.1 0 -
chr21 10081660 10120796 uc002yix.2 0 -
EOF
# Expected output bed6 file with -n option
cat <<EOF > "$TMPDIR/expected_n.bed6"
chr21 10079666 10120808 uc002yiv.1 1 -
chr21 10080031 10081687 uc002yiw.1 1 -
chr21 10081660 10120796 uc002yix.2 1 -
EOF

# Test 1: Default conversion BED12 to BED6
mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null

echo "> Run bedtools_bed12tobed6 on BED12 file"
"$meta_executable" \
--input "../example.bed12" \
--output "output.bed6"

# checks
assert_file_exists "output.bed6"
assert_file_not_empty "output.bed6"
assert_identical_content "output.bed6" "../expected.bed6"
echo "- test1 succeeded -"

popd > /dev/null

# Test 2: Conversion BED12 to BED6 with -n option
mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null

echo "> Run bedtools_bed12tobed6 on BED12 file with -n option"
"$meta_executable" \
--input "../example.bed12" \
--output "output.bed6" \
--n_score

# checks
assert_file_exists "output.bed6"
assert_file_not_empty "output.bed6"
assert_identical_content "output.bed6" "../expected_n.bed6"
echo "- test2 succeeded -"

popd > /dev/null

echo "---- All tests succeeded! ----"
exit 0
155 changes: 155 additions & 0 deletions src/bedtools/bedtools_groupby/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
name: bedtools_groupby
namespace: bedtools
description: |
Summarizes a dataset column based upon common column groupings.
Akin to the SQL "group by" command.
keywords: [groupby, BED]
links:
documentation: https://bedtools.readthedocs.io/en/latest/content/tools/groupby.html
repository: https://github.com/arq5x/bedtools2
homepage: https://bedtools.readthedocs.io/en/latest/#
issue_tracker: https://github.com/arq5x/bedtools2/issues
references:
doi: 10.1093/bioinformatics/btq033
license: MIT
requirements:
commands: [bedtools]
authors:
- __merge__: /src/_authors/theodoro_gasperin.yaml
roles: [ author, maintainer ]

argument_groups:
- name: Inputs
arguments:
- name: --input
alternatives: -i
type: file
direction: input
description: |
The input BED file to be used.
required: true
example: input_a.bed

- name: Outputs
arguments:
- name: --output
type: file
direction: output
description: |
The output groupby BED file.
required: true
example: output.bed

- name: Options
arguments:
- name: --groupby
alternatives: [-g, -grp]
type: string
description: |
Specify the columns (1-based) for the grouping.
The columns must be comma separated.
- Default: 1,2,3
required: true

- name: --column
alternatives: [-c, -opCols]
type: integer
description: |
Specify the column (1-based) that should be summarized.
required: true

- name: --operation
alternatives: [-o, -ops]
type: string
description: |
Specify the operation that should be applied to opCol.
Valid operations:
sum, count, count_distinct, min, max,
mean, median, mode, antimode,
stdev, sstdev (sample standard dev.),
collapse (i.e., print a comma separated list (duplicates allowed)),
distinct (i.e., print a comma separated list (NO duplicates allowed)),
distinct_sort_num (as distinct, but sorted numerically, ascending),
distinct_sort_num_desc (as distinct, but sorted numerically, descending),
concat (i.e., merge values into a single, non-delimited string),
freqdesc (i.e., print desc. list of values:freq)
freqasc (i.e., print asc. list of values:freq)
first (i.e., print first value)
last (i.e., print last value)
Default value: sum
If there is only column, but multiple operations, all operations will be
applied on that column. Likewise, if there is only one operation, but
multiple columns, that operation will be applied to all columns.
Otherwise, the number of columns must match the the number of operations,
and will be applied in respective order.
E.g., "-c 5,4,6 -o sum,mean,count" will give the sum of column 5,
the mean of column 4, and the count of column 6.
The order of output columns will match the ordering given in the command.
- name: --full
type: boolean_true
description: |
Print all columns from input file. The first line in the group is used.
Default: print only grouped columns.
- name: --inheader
type: boolean_true
description: |
Input file has a header line - the first line will be ignored.
- name: --outheader
type: boolean_true
description: |
Print header line in the output, detailing the column names.
If the input file has headers (-inheader), the output file
will use the input's column names.
If the input file has no headers, the output file
will use "col_1", "col_2", etc. as the column names.
- name: --header
type: boolean_true
description: same as '-inheader -outheader'.

- name: --ignorecase
type: boolean_true
description: |
Group values regardless of upper/lower case.
- name: --precision
alternatives: -prec
type: integer
description: |
Sets the decimal precision for output.
default: 5

- name: --delimiter
alternatives: -delim
type: string
description: |
Specify a custom delimiter for the collapse operations.
example: "|"
default: ","

resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh

engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages: [bedtools, procps]
- type: docker
run: |
echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
Loading

0 comments on commit 508821d

Please sign in to comment.