Releases: broadinstitute/viral-ngs
Releases · broadinstitute/viral-ngs
v1.25.0
Upgrades:
- Picard 2.20.5 -> 2.21.1
- Cromwell 33.1 -> 47
- dxWDL 0.72 -> v1.33
- dx-docker replaced with native OS docker
- dx download replaced with dxda
- dx-toolkit v0.285.0 -> v0.288.0
- all
dx_instance_types
in WDL runtimes tov2
DNAnexus instance types
Bugfixes:
- krona failed when output file was on a different filesystem from TMPDIR
- krona attempted to repeatedly reinstall itself due to version documentation mismatch
v1.24.0
New:
- binned coverage plot option in
reports.py align_and_plot
via--binLargePlots
(#957) (thanks to @lakras) metagenomics.py taxlevel_summary
can now also aggregate reports in KrakenUniq format (#948)- krakenuniq WDL task (and
demux_plus
, etc.) writes a viral summary tsv, aggregating kraken reports from all input samples (#948) - add
aggregate_spike_count
function to reports.py to write a tsv table of spike-ins seen in all samples, aggregating separate spike-in reports (#955, #973, #981) - an optional QC check has been added to
taxon_filter.py filter_lastal_bam
(#961)- raises a
QCError()
if the sample name (bam file basename) begins with any number of negative control prefixes ("neg
", "water
", "NTC
") and lastal has identified reads to keep after filtering
- raises a
- add WDL task to aggregate spikein reports (#965)
- add tool wrapper for BBMap (#969)
Changed:
- for more stringent demux, change
max_mismatches=0
, frommax_mismatches=1
(#960) - malformed kraken reports now result in warnings (#958)
- default reference used in DNAnexus WDL pipeline for ERCC seqs updated from from 32-seq file to 96-seq file (#959, #972)
metagenomics.py taxlevel_summary
call moved to separate WDL task and called after kraken where it is called; this beak-out allows aggregation of kraken reports created previously or elsewhere (#964)- fasta ID is now sanitized for picard
CreateSequenceDictionar
y calls to adhere to character set restrictions in SAM/BAM RNAME spec. see: samtools/hts-specs#333 (#977) - Adding spec for timeouts when running WDL on DNAnexus (#983; thanks @godotgildor)
- install conda packages to separate environment within Docker container (#980)
Fixed:
- quieted conda warnings related to use of
-V
(#947) - genome feature table parser for reading tsv/Sequin formatted-files now handles feature qualifiers that consist of only a key, fixing observed issue with
ribosomal_slippage
occurring without a value; qualifier-parsing regex also more robust (#949) - KrakenUniq Krona report now correctly reports "
unique kmers
" rather than "genome coverage
" (#950) - in WDL KrakenUniq task, declared vars with defaults are now non-optional (#951)
- Fixed WDL assemble task assembler parameterization for the joint trinity-spades case (#952)
- The SampleSheet and tabfile readers are now tolerant of a BOM being present—seen in output written by some editors (#954)
- In WDL for assembly scaffolding, contig alignment threshold changed from
Int
toFloat
(#975) - specify USE_JDK_DEFLATER=true and USE_JDK_INFLATER=true for picard until bug in Intel deflator is fixed to prevent sporadic crashes (#977)
Added/Upgraded:
- matplotlib
1.5.3
->2.2.4
(#948, #977) - bedtools
2.27.1
->2.28.0
(#977) - blast
2.6.0
->2.7.1
(#977) - lxml
4.3.0
->4.3.3
(#948) - switch from
picard
topicard-slim
package (sans 'r') (#977) - update picard
2.18.11
->2.20.5
(#977) - krona
2.7
->2.7.1
- bump Docker viral-baseimage
0.1.14
->0.1.15
(#979) - added bbmap
38.56
- change lz4 dependency from lz4-bin sourced from bioconda to lz4-c from conda-forge,
131
->1.9.1
(#977)
v1.23.0
New:
- scaffolder uses ambiguous alignments when no unambiguous ones exist (#904)
- add function to assist Illumina index correction (#917)
illumina.py::guess_barcodes
identifies barcodes with outlier counts after demux, and suggests possible corrections
- Testing migrated from travis-ci.ORG -> travis-ci.COM (#910)
- the Snakemake and WDL pipelines now create a file of the top spikeins seen (#909)
- functionality reporting outlier_barcodes can now act on single-index runs (#932)
- the viral-ngs version is out emitted as a string output of WDL workflows (#928)
- The params
--skipMarkDupes
and--plotOnlyNonDuplicates
are now exposed in the WDL taskplot_coverage
intasks_reports.wdl
(#925)
Changed:
- testing-related performance improvements (#915)
- changes related to the use of conda v4 (#922, #923)
- new stub CondaPackage for use with tests (#916)
- conda fixes related to changed bioconda guidelines (#906)
- primarily with respect to package channel priorities
Fixed:
- changes related to conda on travis (#944)
- Corrections to Broad index sequences listed in illumina_indices.py (#941)
- update
tasks_taxon_filter.wdl
to respecttags_to_clear_space_separated
param (#919) illumina.py
fixed small bug where exception was not raised for missing files (#926)- memory spec corrections in UGER job submission script; JVM memory update for demux (#920)
- fix
util.misc.available_cpu_count()
(#912) - add guardrails to barcode_helper for the case where observed barcodes are null (#946)
Added/Upgraded:
- update PyYAML to v5.1 to address CVE-2017-18342
- perl
5.22.0
->5.26
to support conda build 3 and current conda-forge pinnings (#922, #945 ) java-jdk==8.0.112
->openjdk==8.0.112
- downgrade of blast
2.7.1
->2.6.0
due to upstream boost incompatibilities
Within docker image (#906): - pysam
3.11
->3.12
- biopython
1.68
->1.72
- samtools
1.6
->1.9
- pigz
2.3.4
->2.4
- picard
2.9.0
->2.18.11
v1.22.1
Fixed:
- Fix issues related to Trinity on certain environments (#900)
- allow for flexibility in gatk v3 wrapper supplied by bioconda (#902)
- When determining available memory and cores, cgroup limits are now taken into account (#905)
- Prevent dx jobs launched from travis-ci from running ad infinitum. (#901)
Added/Upgraded:
- viral-baseimage
0.1.13
->0.1.14
(#903)
v1.22.0
New:
- Adding commands for working with kmer sets using the KMC tool. (#854)
- new top-level python file:
kmer_utils.py
providing the following functions (see the documentation for more information):build_kmer_db
: Build a database of kmers occurring in given sequencesdump_kmer_counts
: Dump kmers and their counts from kmer database to a text filefilter_reads
: Filter reads based on their kmer contentkmers_binary_op
: Perform a simple binary operation on kmer setskmers_set_counts
: Copy the kmer database, setting all kmer counts in the output to the given value
- new top-level python file:
- add
metagenomics.py::filter_bam_to_taxa
(#883)- This function filters an input bam file to include only reads that have been mapped to specified taxonomic IDs or scientific names. This requires a classification TSV file, as produced by tools such as Kraken, as well as the NCBI taxonomy database. The column numbers of the tax ID and read ID can be specified, allowing use beyond kraken-format read classification files, however the relationship is assumed to be bijective.
- add WDL for
filter_bam_to_taxa
assembly.py::assemble_spades
now has an option,--minContigLen
, to so spades-based de novo assembly now yields only contigs longer than a specified length (#889)- assembly.py - added --alwaysSucceed option to SPAdes (#888)
- allow RunInfo.xml override in
illumina_demux
WDL task (#891) - Added
read_utils.py::read_names
to extract read names from a sequence file - Added
run-pipe_local.sh
wrapper script for invoking the Snakemake-based pipeline on a single compute instance (#897)
Changed:
- the
Unmatched.bam
file is now preserved in theillumina_demux
WDL task (#887) - increase memory headroom requested for UGER jobs by 10% (#892)
- (Broad only) change dotkit providing
python-yaml
(#890) - use python3 in easy-deploy script if available (#894)
- Snakemake rules now specify their memory requirement via the
mem_mb
param, which is recognized by certain execution engines such as kubernetes (#897)
Fixed:
- do not require chromosome names when checking whether a bam file is sorted (#898)
- add
--no-same-owner
totar -x
in WDL tasks (#880) - safely build snpEff database (#881)
- allow ints in Snakemake remote protocols ("
s3://
"...) (#895) - fix ncbi tbl parser for refseq accessions (#899)
Added/Upgraded:
v1.21.2
v1.21.1
New:
Changed:
- in WDL workflows, default demultiplexing parameters to support novaseq [#868]
- in
illumina.py::illumina_demux
:max_mismatches
changed 0 -> 1, andminimum_base_quality
: 25 -> 10- This supports the four Q scores written by the Novaseq
Upgraded:
v1.21.0
New:
- added a new utility function to merge a group of separate tarballs into one single tarball:
file_utils.py::merge_tarballs()
[#853]- useful for consolidating sequencing runs that have been uploaded in chunks
- data can be piped in and/or out
- tarball content can be extracted to disk during the repack
- added WDL workflow,
isnvs_merge_to_vcf.wdl
, to perform multiple alignment on assembled sequences, call iSNVs, and produce a VCF with variants seen in across all samples relative to a reference [#864]
Changed:
- WDL task files renamed to have
tasks_
prefix [#865] - dxWDL 0.72: revert WDLs to unbound task inputs [#857, #860]
Fixed:
- WDL
dx-launcher
:consolidate_run_tarballs
is now executed as a separate top-level job to allow uploading of the output [#858] - Broad UGER: allow run-pipe.sh to be called when a conda env is active [#851]
- remove
build_lastal_db
as a Snakemake local rule [#849] - testing: fixed an issue with the handling of tempdirs [#866]
Added/upgraded:
- mafft
7.221
->7.402
[#863] - gatk
3.6
->3.8
[#863, #867] - bwa
0.7.15
->0.7.17
[#863] - blast
2.6.0
->2.7.1
[#863] - trimmomatic
0.36
->0.38
[#863] - bedtools
2.26.0
->2.27.1
[#863] - biopython
1.70
->1.72
[#863] - snakemake
4.1.0
->5.2.0
[#852] - pytest
3.0.5
->3.6.3
[#856] - update Dockerfile viral-baseimage 0.1.11 -> 0.1.12 [#861]
- dxWDL
0.69
->0.72
(soon to be merged in todx-toolkit
) [#857]
v1.20.1
v1.20.0
New:
- WDL workflow added to
read_utils.wdl::downsample()
[#819, #821, #823] - WDL workflows added for iSNV calling/v-phaser [#828]
assemble_denovo_with_deplete_and_isnv_calling.wdl
assemble_denovo_with_isnv_calling.wdl
isnvs_one_sample.wdl
- WDL workflow to create a DNAnexus applet to launch demultiplexing [#838]
- demultiplexing on DNAnexus occurs on instances scaled to input run type/size, up to NovaSeq size
- sequencing_center can be passed as an input to the DNAnexus applet, allowing it to be set for demultiplexing at time of upload [#844]
Changed:
- optimizations related to bwa-based depletion (output is piped between several steps to avoid disk writes) [#791]
ncbi.py:: tbl_transfer()
now uses a rewritten feature table parser that is more tolerant of possible edge cases present in feature tables [#826]- sampleNamesFile no longer output by interhost.wdl:: multi_align_mafft_ref() [#808]
-P
removed from snakemake UGER qsub command [#817]intrahost.py::merge_to_vcf
now tries to guess sample names to use in creating VCF file, based on v-phaser output [#828]- WDL workflows altered for compatibility with dxWDL 0.69
illumina.py::illumina_demux
updated to allow NovaSeq-format dates [#831]demux.wdl::illumina_demux
now allows a customRunInfo.xml
to be passed in [#831]demux.wdl:illumina_demux
now allows thread count to be passed to demux [#834]- various small documentation updates [#835]
- gzip replaced with pigz in several external process calls to improve performance [#842]
- in depletion, post-bwa filter changed from
-f0x4
(include unmapped) to-F0x2
(exclude mapped proper pairs) [#791]
Fixed:
- bwa-depleted bam files are now reverted to remove headers related to human alignment that could cause issues for downstream tools [#791]
- snakemake config now correctly lists only hg19 in
bwa_dbs_remove
- bugfixes to WDL workflow for demux.wdl:: merge_and_reheader_bams() [#818, #824]
- fixed bug that caused incorrect encoding of ALT alleles in sample-specific GT columns in
intrahost.py::merge_to_vcf
[#828] - fixed a non-deterministic/intermittent error in kraken call related to pipe closure [#840, #841 ]
Added/upgraded: