Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bedtools links #137

Merged
merged 7 commits into from
Sep 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
- `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
- `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
- `bedtools/bedtools_links`: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).

* `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74).

Expand Down
91 changes: 91 additions & 0 deletions src/bedtools/bedtools_links/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
name: bedtools_links
namespace: bedtools
description: |
Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a file.
This is useful for cases when one wants to manually inspect through a large set of annotations or features.
keywords: [Links, BED, GFF, VCF]
links:
documentation: https://bedtools.readthedocs.io/en/latest/content/tools/links.html
repository: https://github.com/arq5x/bedtools2
homepage: https://bedtools.readthedocs.io/en/latest/#
issue_tracker: https://github.com/arq5x/bedtools2/issues
references:
doi: 10.1093/bioinformatics/btq033
license: MIT
requirements:
commands: [bedtools]
authors:
- __merge__: /src/_authors/theodoro_gasperin.yaml
roles: [ author, maintainer ]

argument_groups:
- name: Inputs
arguments:
- name: --input
alternatives: -i
type: file
description: Input file (bed/gff/vcf).
required: true

- name: Outputs
arguments:
- name: --output
alternatives: -o
type: file
direction: output
description: Output HTML file to be written.

- name: Options
description: |
By default, the links created will point to human (hg18) UCSC browser.
If you have a local mirror, you can override this behavior by supplying
the -base, -org, and -db options.

For example, if the URL of your local mirror for mouse MM9 is called:
http://mymirror.myuniversity.edu, then you would use the following:
--base_url http://mymirror.myuniversity.edu
--organism mouse
--database mm9
arguments:
- name: --base_url
alternatives: -base
type: string
description: |
The “basename” for the UCSC browser.
default: http://genome.ucsc.edu

- name: --organism
alternatives: -org
type: string
description: |
The organism (e.g. mouse, human).
default: human

- name: --database
alternatives: -db
type: string
description: |
The genome build.
default: hg18

resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh

engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages: [bedtools, procps]
- type: docker
run: |
echo "bedtools: \"$(bedtools --version | sed -n 's/^bedtools //p')\"" > /var/software_versions.txt

runners:
- type: executable
- type: nextflow
25 changes: 25 additions & 0 deletions src/bedtools/bedtools_links/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
```
bedtools links -h
```

Tool: bedtools links (aka linksBed)
Version: v2.30.0
Summary: Creates HTML links to an UCSC Genome Browser from a feature file.

Usage: bedtools links [OPTIONS] -i <bed/gff/vcf> > out.html

Options:
-base The browser basename. Default: http://genome.ucsc.edu
-org The organism. Default: human
-db The build. Default: hg18

Example:
By default, the links created will point to human (hg18) UCSC browser.
If you have a local mirror, you can override this behavior by supplying
the -base, -org, and -db options.

For example, if the URL of your local mirror for mouse MM9 is called:
http://mymirror.myuniversity.edu, then you would use the following:
-base http://mymirror.myuniversity.edu
-org mouse
-db mm9
14 changes: 14 additions & 0 deletions src/bedtools/bedtools_links/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

## VIASH START
## VIASH END

set -eo pipefail

# Execute bedtools links
bedtools links \
${par_base_url:+-base "$par_base_url"} \
${par_organism:+-org "$par_organism"} \
${par_database:+-db "$par_database"} \
-i "$par_input" \
> "$par_output"
98 changes: 98 additions & 0 deletions src/bedtools/bedtools_links/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#!/bin/bash

# exit on error
set -eo pipefail

#############################################
# helper functions
assert_file_exists() {
[ -f "$1" ] || { echo "File '$1' does not exist" && exit 1; }
}
assert_file_not_empty() {
[ -s "$1" ] || { echo "File '$1' is empty but shouldn't be" && exit 1; }
}
assert_file_contains() {
grep -q "$2" "$1" || { echo "File '$1' does not contain '$2'" && exit 1; }
}
assert_identical_content() {
diff -a "$2" "$1" \
|| (echo "Files are not identical!" && exit 1)
}
#############################################

# Create directories for tests
echo "Creating Test Data..."
TMPDIR=$(mktemp -d "$meta_temp_dir/XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -r "$TMPDIR"
}
trap clean_up EXIT

# Create test data
cat <<EOF > "$TMPDIR/genes.bed"
chr21 9928613 10012791 uc002yip.1 0 -
chr21 9928613 10012791 uc002yiq.1 0 -
chr21 9928613 10012791 uc002yir.1 0 -
chr21 9928613 10012791 uc010gkv.1 0 -
chr21 9928613 10061300 uc002yis.1 0 -
chr21 10042683 10120796 uc002yit.1 0 -
chr21 10042683 10120808 uc002yiu.1 0 -
chr21 10079666 10120808 uc002yiv.1 0 -
chr21 10080031 10081687 uc002yiw.1 0 -
chr21 10081660 10120796 uc002yix.2 0 -
EOF

# Test 1: Default Use
mkdir "$TMPDIR/test1" && pushd "$TMPDIR/test1" > /dev/null

echo "> Run bedtools_links on BED file"
"$meta_executable" \
--input "../genes.bed" \
--output "genes.html"

# checks
assert_file_exists "genes.html"
assert_file_not_empty "genes.html"
assert_file_contains "genes.html" "uc002yip.1"
echo "- test1 succeeded -"

popd > /dev/null

# Test 2: Base URL
mkdir "$TMPDIR/test2" && pushd "$TMPDIR/test2" > /dev/null

echo "> Run bedtools_links with base option"
"$meta_executable" \
--input "../genes.bed" \
--output "genes.html" \
--base_url "http://genome.ucsc.edu"

# checks
assert_file_exists "genes.html"
assert_file_not_empty "genes.html"
assert_file_contains "genes.html" "uc002yip.1"
echo "- test2 succeeded -"

popd > /dev/null

# Test 3: Organism and Genome Database Build
mkdir "$TMPDIR/test3" && pushd "$TMPDIR/test3" > /dev/null

echo "> Run bedtools_links with organism option and genome database build"
"$meta_executable" \
--input "../genes.bed" \
--output "genes.html" \
--base_url "http://genome.ucsc.edu" \
--organism "mouse" \
--database "mm9"

# checks
assert_file_exists "genes.html"
assert_file_not_empty "genes.html"
assert_file_contains "genes.html" "uc002yip.1"
echo "- test3 succeeded -"

popd > /dev/null

echo "---- All tests succeeded! ----"
exit 0