forked from viash-hub/biobox
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bd_rhapsody_make_reference
: Create a reference for the BD Rhapsody …
…pipeline (viash-hub#75) * `bd_rhapsody/bd_rhapsody_make_reference`: Create a reference for the BD Rhapsody pipeline * add missing metadata * remove unicode * trigger * process comments * add authors * Apply suggestions from code review Co-authored-by: Dorien <[email protected]> --------- Co-authored-by: Dorien <[email protected]>
- Loading branch information
Showing
11 changed files
with
660 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
name: Robrecht Cannoodt | ||
info: | ||
links: | ||
email: [email protected] | ||
github: rcannood | ||
orcid: "0000-0003-3641-729X" | ||
linkedin: robrechtcannoodt | ||
organizations: | ||
- name: Data Intuitive | ||
href: https://www.data-intuitive.com | ||
role: Data Science Engineer | ||
- name: Open Problems | ||
href: https://openproblems.bio | ||
role: Core Member |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
name: Weiwei Schultz | ||
info: | ||
organizations: | ||
- name: Janssen R&D US | ||
role: Associate Director Data Sciences |
143 changes: 143 additions & 0 deletions
143
src/bd_rhapsody/bd_rhapsody_make_reference/config.vsh.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
name: bd_rhapsody_make_reference | ||
namespace: bd_rhapsody | ||
description: | | ||
The Reference Files Generator creates an archive containing Genome Index | ||
and Transcriptome annotation files needed for the BD Rhapsody Sequencing | ||
Analysis Pipeline. The app takes as input one or more FASTA and GTF files | ||
and produces a compressed archive in the form of a tar.gz file. The | ||
archive contains: | ||
- STAR index | ||
- Filtered GTF file | ||
keywords: [genome, reference, index, align] | ||
links: | ||
repository: https://bitbucket.org/CRSwDev/cwl/src/master/v2.2.1/Extra_Utilities/ | ||
documentation: https://bd-rhapsody-bioinfo-docs.genomics.bd.com/resources/extra_utilities.html#make-rhapsody-reference | ||
license: Unknown | ||
authors: | ||
- __merge__: /src/_authors/robrecht_cannoodt.yaml | ||
roles: [ author, maintainer ] | ||
- __merge__: /src/_authors/weiwei_schultz.yaml | ||
roles: [ contributor ] | ||
|
||
argument_groups: | ||
- name: Inputs | ||
arguments: | ||
- type: file | ||
name: --genome_fasta | ||
required: true | ||
description: Reference genome file in FASTA or FASTA.GZ format. The BD Rhapsody Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse. | ||
example: genome_sequence.fa.gz | ||
multiple: true | ||
info: | ||
config_key: Genome_fasta | ||
- type: file | ||
name: --gtf | ||
required: true | ||
description: | | ||
File path to the transcript annotation files in GTF or GTF.GZ format. The Sequence Analysis Pipeline requires the 'gene_name' or | ||
'gene_id' attribute to be set on each gene and exon feature. Gene and exon feature lines must have the same attribute, and exons | ||
must have a corresponding gene with the same value. For TCR/BCR assays, the TCR or BCR gene segments must have the 'gene_type' or | ||
'gene_biotype' attribute set, and the value should begin with 'TR' or 'IG', respectively. | ||
example: transcriptome_annotation.gtf.gz | ||
multiple: true | ||
info: | ||
config_key: Gtf | ||
- type: file | ||
name: --extra_sequences | ||
description: | | ||
File path to additional sequences in FASTA format to use when building the STAR index. (e.g. transgenes or CRISPR guide barcodes). | ||
GTF lines for these sequences will be automatically generated and combined with the main GTF. | ||
required: false | ||
multiple: true | ||
info: | ||
config_key: Extra_sequences | ||
- name: Outputs | ||
arguments: | ||
- type: file | ||
name: --reference_archive | ||
direction: output | ||
required: true | ||
description: | | ||
A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an | ||
input in the BD Rhapsody Sequencing Analysis Pipeline. | ||
example: star_index.tar.gz | ||
- name: Arguments | ||
arguments: | ||
- type: string | ||
name: --mitochondrial_contigs | ||
description: | | ||
Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are | ||
identified as 'nuclear fragments' in the ATACseq analysis pipeline. | ||
required: false | ||
multiple: true | ||
default: [chrM, chrMT, M, MT] | ||
info: | ||
config_key: Mitochondrial_contigs | ||
- type: boolean_true | ||
name: --filtering_off | ||
description: | | ||
By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features | ||
having the following attribute values are kept: | ||
- protein_coding | ||
- lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97) | ||
- IG_LV_gene | ||
- IG_V_gene | ||
- IG_V_pseudogene | ||
- IG_D_gene | ||
- IG_J_gene | ||
- IG_J_pseudogene | ||
- IG_C_gene | ||
- IG_C_pseudogene | ||
- TR_V_gene | ||
- TR_V_pseudogene | ||
- TR_D_gene | ||
- TR_J_gene | ||
- TR_J_pseudogene | ||
- TR_C_gene | ||
If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True. | ||
info: | ||
config_key: Filtering_off | ||
- type: boolean_true | ||
name: --wta_only_index | ||
description: Build a WTA only index, otherwise builds a WTA + ATAC index. | ||
info: | ||
config_key: Wta_Only | ||
- type: string | ||
name: --extra_star_params | ||
description: Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line. | ||
example: --limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11 | ||
required: false | ||
info: | ||
config_key: Extra_STAR_params | ||
|
||
resources: | ||
- type: python_script | ||
path: script.py | ||
- path: make_rhap_reference_2.2.1_nodocker.cwl | ||
|
||
test_resources: | ||
- type: bash_script | ||
path: test.sh | ||
- path: test_data | ||
|
||
requirements: | ||
commands: [ "cwl-runner" ] | ||
|
||
engines: | ||
- type: docker | ||
image: bdgenomics/rhapsody:2.2.1 | ||
setup: | ||
- type: apt | ||
packages: [procps] | ||
- type: python | ||
packages: [cwlref-runner, cwl-runner] | ||
- type: docker | ||
run: | | ||
echo "bdgenomics/rhapsody: 2.2.1" > /var/software_versions.txt | ||
runners: | ||
- type: executable | ||
- type: nextflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
```bash | ||
cwl-runner src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl --help | ||
``` | ||
|
||
usage: src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl | ||
[-h] [--Archive_prefix ARCHIVE_PREFIX] | ||
[--Extra_STAR_params EXTRA_STAR_PARAMS] | ||
[--Extra_sequences EXTRA_SEQUENCES] [--Filtering_off] --Genome_fasta | ||
GENOME_FASTA --Gtf GTF [--Maximum_threads MAXIMUM_THREADS] | ||
[--Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS] [--WTA_Only] | ||
[job_order] | ||
|
||
The Reference Files Generator creates an archive containing Genome Index and | ||
Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing | ||
Analysis Pipeline. The app takes as input one or more FASTA and GTF files and | ||
produces a compressed archive in the form of a tar.gz file. The archive | ||
contains:\n - STAR index\n - Filtered GTF file | ||
|
||
positional arguments: | ||
job_order Job input json file | ||
|
||
options: | ||
-h, --help show this help message and exit | ||
--Archive_prefix ARCHIVE_PREFIX | ||
A prefix for naming the compressed archive file | ||
containing the Reference genome index and annotation | ||
files. The default value is constructed based on the | ||
input Reference files. | ||
--Extra_STAR_params EXTRA_STAR_PARAMS | ||
Additional parameters to pass to STAR when building | ||
the genome index. Specify exactly like how you would | ||
on the command line. Example: --limitGenomeGenerateRAM | ||
48000 --genomeSAindexNbases 11 | ||
--Extra_sequences EXTRA_SEQUENCES | ||
Additional sequences in FASTA format to use when | ||
building the STAR index. (E.g. phiX genome) | ||
--Filtering_off By default the input Transcript Annotation files are | ||
filtered based on the gene_type/gene_biotype | ||
attribute. Only features having the following | ||
attribute values are are kept: - protein_coding - | ||
lncRNA (lincRNA and antisense for Gencode < | ||
v31/M22/Ensembl97) - IG_LV_gene - IG_V_gene - | ||
IG_V_pseudogene - IG_D_gene - IG_J_gene - | ||
IG_J_pseudogene - IG_C_gene - IG_C_pseudogene - | ||
TR_V_gene - TR_V_pseudogene - TR_D_gene - TR_J_gene - | ||
TR_J_pseudogene - TR_C_gene If you have already pre- | ||
filtered the input Annotation files and/or wish to | ||
turn-off the filtering, please set this option to | ||
True. | ||
--Genome_fasta GENOME_FASTA | ||
Reference genome file in FASTA format. The BD | ||
Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38 | ||
for Human and GRCm39 for Mouse. | ||
--Gtf GTF Transcript annotation files in GTF format. The BD | ||
Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode | ||
v42 for Human and M31 for Mouse. | ||
--Maximum_threads MAXIMUM_THREADS | ||
The maximum number of threads to use in the pipeline. | ||
By default, all available cores are used. | ||
--Mitochondrial_Contigs MITOCHONDRIAL_CONTIGS | ||
Names of the Mitochondrial contigs in the provided | ||
Reference Genome. Fragments originating from contigs | ||
other than these are identified as 'nuclear fragments' | ||
in the ATACseq analysis pipeline. | ||
--WTA_Only Build a WTA only index, otherwise builds a WTA + ATAC | ||
index. |
115 changes: 115 additions & 0 deletions
115
src/bd_rhapsody/bd_rhapsody_make_reference/make_rhap_reference_2.2.1_nodocker.cwl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
requirements: | ||
InlineJavascriptRequirement: {} | ||
class: CommandLineTool | ||
label: Reference Files Generator for BD Rhapsodyâ„¢ Sequencing Analysis Pipeline | ||
cwlVersion: v1.2 | ||
doc: >- | ||
The Reference Files Generator creates an archive containing Genome Index and Transcriptome annotation files needed for the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline. The app takes as input one or more FASTA and GTF files and produces a compressed archive in the form of a tar.gz file. The archive contains:\n - STAR index\n - Filtered GTF file | ||
|
||
|
||
baseCommand: run_reference_generator.sh | ||
inputs: | ||
Genome_fasta: | ||
type: File[] | ||
label: Reference Genome | ||
doc: |- | ||
Reference genome file in FASTA format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses GRCh38 for Human and GRCm39 for Mouse. | ||
inputBinding: | ||
prefix: --reference-genome | ||
shellQuote: false | ||
Gtf: | ||
type: File[] | ||
label: Transcript Annotations | ||
doc: |- | ||
Transcript annotation files in GTF format. The BD Rhapsodyâ„¢ Sequencing Analysis Pipeline uses Gencode v42 for Human and M31 for Mouse. | ||
inputBinding: | ||
prefix: --gtf | ||
shellQuote: false | ||
Extra_sequences: | ||
type: File[]? | ||
label: Extra Sequences | ||
doc: |- | ||
Additional sequences in FASTA format to use when building the STAR index. (E.g. phiX genome) | ||
inputBinding: | ||
prefix: --extra-sequences | ||
shellQuote: false | ||
Mitochondrial_Contigs: | ||
type: string[]? | ||
default: ["chrM", "chrMT", "M", "MT"] | ||
label: Mitochondrial Contig Names | ||
doc: |- | ||
Names of the Mitochondrial contigs in the provided Reference Genome. Fragments originating from contigs other than these are identified as 'nuclear fragments' in the ATACseq analysis pipeline. | ||
inputBinding: | ||
prefix: --mitochondrial-contigs | ||
shellQuote: false | ||
Filtering_off: | ||
type: boolean? | ||
label: Turn off filtering | ||
doc: |- | ||
By default the input Transcript Annotation files are filtered based on the gene_type/gene_biotype attribute. Only features having the following attribute values are are kept: | ||
- protein_coding | ||
- lncRNA (lincRNA and antisense for Gencode < v31/M22/Ensembl97) | ||
- IG_LV_gene | ||
- IG_V_gene | ||
- IG_V_pseudogene | ||
- IG_D_gene | ||
- IG_J_gene | ||
- IG_J_pseudogene | ||
- IG_C_gene | ||
- IG_C_pseudogene | ||
- TR_V_gene | ||
- TR_V_pseudogene | ||
- TR_D_gene | ||
- TR_J_gene | ||
- TR_J_pseudogene | ||
- TR_C_gene | ||
If you have already pre-filtered the input Annotation files and/or wish to turn-off the filtering, please set this option to True. | ||
inputBinding: | ||
prefix: --filtering-off | ||
shellQuote: false | ||
WTA_Only: | ||
type: boolean? | ||
label: WTA only index | ||
doc: Build a WTA only index, otherwise builds a WTA + ATAC index. | ||
inputBinding: | ||
prefix: --wta-only-index | ||
shellQuote: false | ||
Archive_prefix: | ||
type: string? | ||
label: Archive Prefix | ||
doc: |- | ||
A prefix for naming the compressed archive file containing the Reference genome index and annotation files. The default value is constructed based on the input Reference files. | ||
inputBinding: | ||
prefix: --archive-prefix | ||
shellQuote: false | ||
Extra_STAR_params: | ||
type: string? | ||
label: Extra STAR Params | ||
doc: |- | ||
Additional parameters to pass to STAR when building the genome index. Specify exactly like how you would on the command line. | ||
Example: | ||
--limitGenomeGenerateRAM 48000 --genomeSAindexNbases 11 | ||
inputBinding: | ||
prefix: --extra-star-params | ||
shellQuote: true | ||
|
||
Maximum_threads: | ||
type: int? | ||
label: Maximum Number of Threads | ||
doc: |- | ||
The maximum number of threads to use in the pipeline. By default, all available cores are used. | ||
inputBinding: | ||
prefix: --maximum-threads | ||
shellQuote: false | ||
|
||
outputs: | ||
|
||
Archive: | ||
type: File | ||
doc: |- | ||
A Compressed archive containing the Reference Genome Index and annotation GTF files. This archive is meant to be used as an input in the BD Rhapsodyâ„¢ Sequencing Analysis Pipeline. | ||
id: Reference_Archive | ||
label: Reference Files Archive | ||
outputBinding: | ||
glob: '*.tar.gz' | ||
|
Oops, something went wrong.