Skip to content

v1.19.0

Compare
Choose a tag to compare
@tomkinsc tomkinsc released this 07 Dec 21:25
· 478 commits to master since this release
fe88a83

This is a release with many changes, including new WDL pipelines, a distribution of viral-ngs on DNAnexus that will be updated in sync with the latest version of viral-ngs, the ability to provide multiple references for scaffolding, and several critical bug fixes. With this release, the Docker image for viral-ngs moves from Docker Hub to quay.io/broadinstitute/viral-ngs.

New:

  • WDL (more info) pipelines have been added, inspired by the previous DNAnexus implementation of viral-ngs. The WDL files currently reside within the pipes/WDL/ directory of viral-ngs. The pipelines can be executed locally or in the Google cloud via cromwell(on bioconda), or via the public distribution available on DNAnexus.
    • WDL workflows are tested locally on Travis via Cromwell
    • WDL workflows are compiled for DNAnexus via dxWDL, and tested on DNAnexus
  • a simple form of reference selection via assembly.py::order_and_orient. Scaffolding is now performed using several references (in parallel); the one that yields the most non-N bases is chosen to be used for the scaffolded genome. For the positional argument, inReference, multiple FASTA files may now be provided, each containing one reference genome. Alternatively, multiple references may be given by specifying a single filename, and giving the number of reference segments with the --nGenomeSegments parameter. If multiple references are given, they must all contain the same number of segments listed in the same order.
    • This has been included in the new WDL pipelines
  • New kraken execution strategy to process multiple inputs in one run
  • taxon_filter.py changes to deplete_bmtagger_bam and deplete_blastn_bam: can now accept blast/bmtagger databases as .tar.gz, .tar.lz4, .tar.bz2 bundles and also as unindexed fasta files (that will be indexed on the fly)
  • new internal function util.file.extract_tarball exposed on the CLI as read_utils.py::extract_tarball. Accepts stdin piped input.

Changed:

  • various and extensive changes to how the viral-ngs Docker image is prepared and distributed:
    • Note: The Docker image is now available from quay.io/broadinstitute/viral-ngs, which is faster for staging than Docker Hub
    • the Docker image build process no longer relies on the easy-deploy-viral-ngs.sh script
  • --threads argparse option now common and available across viral-ngs commands
  • optimizations in illumina.py::illumina_demux
  • illumina.py::common_barcodes execution time has been reduced
  • in easy-deploy-viral-ngs.sh, some messages have been moved from stdout to stderr
  • taxon_filter.py: clean up and optimization around blastn-based read depletion
  • various development-related changes including:
    • travis cleanup re: pip package installs, conditionals, build matrix
    • Docker deployment bugfixes

Fixed:

  • prevent reports.py::plot_coverage from removing the bam file provided as input if it is already sorted and dupe removal is being not performed. In such cases the input bam is used directly and is now preserved.
  • diamond tests for accession taxonomy fixed: subprocess.PIPE replaced with named pipes to prevent deadlocks
  • taxon_filter.py::bmtagger_build_db default value for word_size is now 18, not 8
  • fixes the use of fasta databases for taxon_filter.py::deplete_bmtagger_bam and deplete_human

Added/Upgraded:

  • pysam 0.12.0.1 -> 0.13.0
  • samtools 1.5 -> 1.6
  • kraken 0.10.6_fork3 -> 1.0.0_fork3
  • lz4-bin 131 added as requirement
  • pigz 2.3.4 added as requirement
  • lbzip2 2.5 added as requirement