v1.22.0
New:
- Adding commands for working with kmer sets using the KMC tool. (#854)
- new top-level python file:
kmer_utils.py
providing the following functions (see the documentation for more information):build_kmer_db
: Build a database of kmers occurring in given sequencesdump_kmer_counts
: Dump kmers and their counts from kmer database to a text filefilter_reads
: Filter reads based on their kmer contentkmers_binary_op
: Perform a simple binary operation on kmer setskmers_set_counts
: Copy the kmer database, setting all kmer counts in the output to the given value
- new top-level python file:
- add
metagenomics.py::filter_bam_to_taxa
(#883)- This function filters an input bam file to include only reads that have been mapped to specified taxonomic IDs or scientific names. This requires a classification TSV file, as produced by tools such as Kraken, as well as the NCBI taxonomy database. The column numbers of the tax ID and read ID can be specified, allowing use beyond kraken-format read classification files, however the relationship is assumed to be bijective.
- add WDL for
filter_bam_to_taxa
assembly.py::assemble_spades
now has an option,--minContigLen
, to so spades-based de novo assembly now yields only contigs longer than a specified length (#889)- assembly.py - added --alwaysSucceed option to SPAdes (#888)
- allow RunInfo.xml override in
illumina_demux
WDL task (#891) - Added
read_utils.py::read_names
to extract read names from a sequence file - Added
run-pipe_local.sh
wrapper script for invoking the Snakemake-based pipeline on a single compute instance (#897)
Changed:
- the
Unmatched.bam
file is now preserved in theillumina_demux
WDL task (#887) - increase memory headroom requested for UGER jobs by 10% (#892)
- (Broad only) change dotkit providing
python-yaml
(#890) - use python3 in easy-deploy script if available (#894)
- Snakemake rules now specify their memory requirement via the
mem_mb
param, which is recognized by certain execution engines such as kubernetes (#897)
Fixed:
- do not require chromosome names when checking whether a bam file is sorted (#898)
- add
--no-same-owner
totar -x
in WDL tasks (#880) - safely build snpEff database (#881)
- allow ints in Snakemake remote protocols ("
s3://
"...) (#895) - fix ncbi tbl parser for refseq accessions (#899)
Added/Upgraded: