Skip to content

Commit

Permalink
Merge branch 'main' into kallisto_quant
Browse files Browse the repository at this point in the history
  • Loading branch information
emmarousseau authored Sep 15, 2024
2 parents 81fdf59 + fe56ee7 commit 98fa267
Show file tree
Hide file tree
Showing 81 changed files with 7,522 additions and 4 deletions.
27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,32 @@
* `bedtools`:
- `bedtools/bedtools_intersect`: Allows one to screen for overlaps between two sets of genomic features (PR #94).
- `bedtools/bedtools_sort`: Sorts a feature file (bed/gff/vcf) by chromosome and other criteria (PR #98).
- `bedtools/bedtools_genomecov`: Compute the coverage of a feature file (bed/gff/vcf/bam) among a genome (PR #128).
- `bedtools/bedtools_groupby`: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command (PR #123).
- `bedtools/bedtools_merge`: Merges overlapping BED/GFF/VCF entries into a single interval (PR #118).
- `bedtools/bedtools_bamtofastq`: Convert BAM alignments to FASTQ files (PR #101).
- `bedtools/bedtools_bedtobam`: Converts genomic feature records (bed/gff/vcf) to BAM format (PR #111).
- `bedtools/bedtools_bed12tobed6`: Converts BED12 files to BED6 files (PR #140).
- `bedtools/bedtools_links`: Creates an HTML file with links to an instance of the UCSC Genome Browser for all features / intervals in a (bed/gff/vcf) file (PR #137).

* `qualimap/qualimap_rnaseq`: RNA-seq QC analysis using qualimap (PR #74).

* `rsem/rsem_prepare_reference`: Prepare transcript references for RSEM (PR #89).

* `bcftools`:
- `bcftools/bcftools_concat`: Concatenate or combine VCF/BCF files (PR #145).
- `bcftools/bcftools_norm`: Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows (PR #144).
- `bcftools/bcftools_annotate`: Add or remove annotations from a VCF/BCF file (PR #143).
- `bcftools/bcftools_stats`: Parses VCF or BCF and produces a txt stats file which can be plotted using plot-vcfstats (PR #142).
- `bcftools/bcftools_sort`: Sorts BCF/VCF files by position and other criteria (PR #141).

* `fastqc`: High throughput sequence quality control analysis tool (PR #92).

* `kallisto`:
- `kallisto/kallisto_quant`: Quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences
using high-throughput sequencing reads (PR #152).


## MINOR CHANGES

* `busco` components: update BUSCO to `5.7.1` (PR #72).
Expand Down Expand Up @@ -127,14 +142,24 @@
- `samtools/samtools_fastq`: Converts a SAM/BAM/CRAM file to FASTA (PR #53).

* `umi_tools`:
-`umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).
- `umi_tools/umi_tools_extract`: Flexible removal of UMI sequences from fastq reads (PR #71).
- `umi_tools/umi_tools_prepareforrsem`: Fix paired-end reads in name sorted BAM file to prepare for RSEM (PR #148).

* `falco`: A C++ drop-in replacement of FastQC to assess the quality of sequence read data (PR #43).

* `bedtools`:
- `bedtools_getfasta`: extract sequences from a FASTA file for each of the
intervals defined in a BED/GFF/VCF file (PR #59).

* `sortmerna`: Local sequence alignment tool for mapping, clustering, and filtering rRNA from metatranscriptomic
data. (PR #146)

* `fq_subsample`: Sample a subset of records from single or paired FASTQ files (PR #147).

* `kallisto`:
- `kallisto_index`: Create a kallisto index (PR #149).


## MINOR CHANGES

* Uniformize component metadata (PR #23).
Expand Down
250 changes: 250 additions & 0 deletions src/bcftools/bcftools_annotate/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
name: bcftools_annotate
namespace: bcftools
description: |
Add or remove annotations from a VCF/BCF file.
keywords: [Annotate, VCF, BCF]
links:
homepage: https://samtools.github.io/bcftools/
documentation: https://samtools.github.io/bcftools/bcftools.html#annotate
repository: https://github.com/samtools/bcftools
issue_tracker: https://github.com/samtools/bcftools/issues
references:
doi: https://doi.org/10.1093/gigascience/giab008
license: MIT/Expat, GNU
requirements:
commands: [bcftools]
authors:
- __merge__: /src/_authors/theodoro_gasperin.yaml
roles: [author]

argument_groups:
- name: Inputs
arguments:
- name: --input
alternatives: -i
type: file
multiple: true
description: Input VCF/BCF file.
required: true

- name: Outputs
arguments:
- name: --output
alternatives: -o
direction: output
type: file
description: Output annotated file.
required: true

- name: Options
description: |
For examples on how to use use bcftools annotate see http://samtools.github.io/bcftools/howtos/annotate.html.
For more details on the options see https://samtools.github.io/bcftools/bcftools.html#annotate.
arguments:

- name: --annotations
alternatives: --a
type: file
description: |
VCF file or tabix-indexed FILE with annotations: CHR\tPOS[\tVALUE]+ .
- name: --columns
alternatives: --c
type: string
description: |
List of columns in the annotation file, e.g. CHROM,POS,REF,ALT,-,INFO/TAG.
See man page for details.
- name: --columns_file
alternatives: --C
type: file
description: |
Read -c columns from FILE, one name per row, with optional --merge_logic TYPE: NAME[ TYPE].
- name: --exclude
alternatives: --e
type: string
description: |
Exclude sites for which the expression is true.
See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
example: 'QUAL >= 30 && DP >= 10'

- name: --force
type: boolean_true
description: |
continue even when parsing errors, such as undefined tags, are encountered.
Note this can be an unsafe operation and can result in corrupted BCF files.
If this option is used, make sure to sanity check the result thoroughly.
- name: --header_line
alternatives: --H
type: string
description: |
Header line which should be appended to the VCF header, can be given multiple times.
- name: --header_lines
alternatives: --h
type: file
description: |
File with header lines to append to the VCF header.
For example:
##INFO=<ID=NUMERIC_TAG,Number=1,Type=Integer,Description="Example header line">
##INFO=<ID=STRING_TAG,Number=1,Type=String,Description="Yet another header line">
- name: --set_id
alternatives: --I
type: string
description: |
Set ID column using a `bcftools query`-like expression, see man page for details.
- name: --include
type: string
description: |
Select sites for which the expression is true.
See https://samtools.github.io/bcftools/bcftools.html#expressions for details.
example: 'QUAL >= 30 && DP >= 10'

- name: --keep_sites
alternatives: --k
type: boolean_true
description: |
Leave --include/--exclude sites unchanged instead of discarding them.
- name: --merge_logic
alternatives: --l
type: string
choices:
description: |
When multiple regions overlap a single record, this option defines how to treat multiple annotation values.
See man page for more details.
- name: --mark_sites
alternatives: --m
type: string
description: |
Annotate sites which are present ("+") or absent ("-") in the -a file with a new INFO/TAG flag.
- name: --min_overlap
type: string
description: |
Minimum overlap required as a fraction of the variant in the annotation -a file (ANN),
in the target VCF file (:VCF), or both for reciprocal overlap (ANN:VCF).
By default overlaps of arbitrary length are sufficient.
The option can be used only with the tab-delimited annotation -a file and with BEG and END columns present.
- name: --no_version
type: boolean_true
description: |
Do not append version and command line information to the output VCF header.
- name: --output_type
alternatives: --O
type: string
choices: ['u', 'z', 'b', 'v']
description: |
Output type:
u: uncompressed BCF
z: compressed VCF
b: compressed BCF
v: uncompressed VCF
- name: --pair_logic
type: string
choices: ['snps', 'indels', 'both', 'all', 'some', 'exact']
description: |
Controls how to match records from the annotation file to the target VCF.
Effective only when -a is a VCF or BCF file.
The option replaces the former uninuitive --collapse.
See Common Options for more.
- name: --regions
alternatives: --r
type: string
description: |
Restrict to comma-separated list of regions.
Following formats are supported: chr|chr:pos|chr:beg-end|chr:beg-[,…​].
example: '20:1000000-2000000'

- name: --regions_file
alternatives: --R
type: file
description: |
Restrict to regions listed in a file.
Regions can be specified either on a VCF, BED, or tab-delimited file (the default).
For more information check manual.
- name: --regions_overlap
type: string
choices: ['pos', 'record', 'variant', '0', '1', '2']
description: |
This option controls how overlapping records are determined:
set to 'pos' or '0' if the VCF record has to have POS inside a region (this corresponds to the default behavior of -t/-T);
set to 'record' or '1' if also overlapping records with POS outside a region should be included (this is the default behavior of -r/-R,
and includes indels with POS at the end of a region, which are technically outside the region);
or set to 'variant' or '2' to include only true overlapping variation (compare the full VCF representation "TA>T-" vs the true sequence variation "A>-").
- name: --rename_annotations
type: file
description: |
Rename annotations: TYPE/old\tnew, where TYPE is one of FILTER,INFO,FORMAT.
- name: --rename_chromosomes
type: file
description: |
Rename chromosomes according to the map in file, with "old_name new_name\n" pairs
separated by whitespaces, each on a separate line.
- name: --samples
type: string
description: |
Subset of samples to annotate.
See also https://samtools.github.io/bcftools/bcftools.html#common_options.
- name: --samples_file
type: file
description: |
Subset of samples to annotate in file format.
See also https://samtools.github.io/bcftools/bcftools.html#common_options.
- name: --single_overlaps
type: boolean_true
description: |
Use this option to keep memory requirements low with very large annotation files.
Note, however, that this comes at a cost, only single overlapping intervals are considered in this mode.
This was the default mode until the commit af6f0c9 (Feb 24 2019).
- name: --remove
alternatives: --x
type: string
description: |
List of annotations to remove.
Use "FILTER" to remove all filters or "FILTER/SomeFilter" to remove a specific filter.
Similarly, "INFO" can be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags except GT.
To remove all INFO tags except "FOO" and "BAR", use "^INFO/FOO,INFO/BAR" (and similarly for FORMAT and FILTER).
"INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".
resources:
- type: bash_script
path: script.sh

test_resources:
- type: bash_script
path: test.sh

engines:
- type: docker
image: debian:stable-slim
setup:
- type: apt
packages: [bcftools, procps]
- type: docker
run: |
echo "bcftools: \"$(bcftools --version | grep 'bcftools' | sed -n 's/^bcftools //p')\"" > /var/software_versions.txt
test_setup:
- type: apt
packages: [tabix]

runners:
- type: executable
- type: nextflow

41 changes: 41 additions & 0 deletions src/bcftools/bcftools_annotate/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
```
bcftools annotate -h
```

annotate: option requires an argument -- 'h'

About: Annotate and edit VCF/BCF files.
Usage: bcftools annotate [options] VCF

Options:
-a, --annotations FILE VCF file or tabix-indexed FILE with annotations: CHR\tPOS[\tVALUE]+
-c, --columns LIST List of columns in the annotation file, e.g. CHROM,POS,REF,ALT,-,INFO/TAG. See man page for details
-C, --columns-file FILE Read -c columns from FILE, one name per row, with optional --merge-logic TYPE: NAME[ TYPE]
-e, --exclude EXPR Exclude sites for which the expression is true (see man page for details)
--force Continue despite parsing error (at your own risk!)
-H, --header-line STR Header line which should be appended to the VCF header, can be given multiple times
-h, --header-lines FILE Lines which should be appended to the VCF header
-I, --set-id [+]FORMAT Set ID column using a `bcftools query`-like expression, see man page for details
-i, --include EXPR Select sites for which the expression is true (see man page for details)
-k, --keep-sites Leave -i/-e sites unchanged instead of discarding them
-l, --merge-logic TAG:TYPE Merge logic for multiple overlapping regions (see man page for details), EXPERIMENTAL
-m, --mark-sites [+-]TAG Add INFO/TAG flag to sites which are ("+") or are not ("-") listed in the -a file
--min-overlap ANN:VCF Required overlap as a fraction of variant in the -a file (ANN), the VCF (:VCF), or reciprocal (ANN:VCF)
--no-version Do not append version and command line to the header
-o, --output FILE Write output to a file [standard output]
-O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
--pair-logic STR Matching records by <snps|indels|both|all|some|exact>, see man page for details [some]
-r, --regions REGION Restrict to comma-separated list of regions
-R, --regions-file FILE Restrict to regions listed in FILE
--regions-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
--rename-annots FILE Rename annotations: TYPE/old\tnew, where TYPE is one of FILTER,INFO,FORMAT
--rename-chrs FILE Rename sequences according to the mapping: old\tnew
-s, --samples [^]LIST Comma separated list of samples to annotate (or exclude with "^" prefix)
-S, --samples-file [^]FILE File of samples to annotate (or exclude with "^" prefix)
--single-overlaps Keep memory low by avoiding complexities arising from handling multiple overlapping intervals
-x, --remove LIST List of annotations (e.g. ID,INFO/DP,FORMAT/DP,FILTER) to remove (or keep with "^" prefix). See man page for details
--threads INT Number of extra output compression threads [0]

Examples:
http://samtools.github.io/bcftools/howtos/annotate.html

Loading

0 comments on commit 98fa267

Please sign in to comment.