v1.18.0
New:
- The Snakemake pipeline can now source database files from S3, GS, or SFTP if given protocol-prefixed paths (
s3://
,gs://
,sftp://
) and if the system is preconfigured with credentials. - The
config.yaml
file has been changed to includes3://*
paths for pre-built databases, rather than Broad Institute-specific paths (and files listed are live and available for all!) - Kraken is now enabled on OSX, though significant RAM is required to use it
- The reports.py::
align_and_plot_coverage
and read_utils.py::align_and_fix
functions now expose an optional argument,--minScoreToFilter
. This adds an option—when using bwa—to calculate an alignment score for each query by summing the scores across the query's alignments, and keep only the queries whose score is at least the value of the specified threshold. - sample sheets can now be specified in
*.csv.gz
format - For debugging or more bespoke analysis, temp files can now be kept more easily by setting the
VIRAL_NGS_TMP_DIRKEEP
environment variable - The cd-hit-dup tool has been added as an alternative to mvicuna for removing duplicate reads, via a new CLI function read_utils.py::
rmdup_cdhit_bam
. Note that this is not currently used in the pipeline by default. - The Gap2Seq tool has been added for filling gaps between contigs. It is exposed via the new CLI command: assembly.py::
gapfill_gap2seq
. Note that this is not currently used in the pipeline by default. - The Spades assembler has been added as an alternative to Trinity for de novo assembly. Note that this is not currently used in the pipeline by default.
- Expose blastn
--chunkSize
intaxon_filter.deplete_human
.
Changed:
- metagenomics rules in the Snakemake pipeline now break out kraken files as separate targets
- improvements to speed of automated tests
- The source and binaries for mvicuna and v-phaser2 have been removed from this repository since they now reside in their own repositories
- viral-ngs is no longer tested against or distributed for Python 3.4, from this release forward. This should not impact users since the package is typically installed in an isolated conda environment with Python 3.5 or 2.7.
- The Snakemake rules and cluster-submitter have been updated to reflect changes to the UGER cluster system at the Broad Institute, which now requires that
-l h_rt hh:mm:ss
be passed to schedule max runtime for each job - performance improvements to lastal filtering
- lastal database is now built automatically if supplied pre-built
- SPAdes wrapper more resilient to empty fastq inputs
- Reimplement samtools.filterByCigarString using pysam instead of samtools
- Kraken on OSX now exists on broad-viral: enable it in OSX git hooks and turn on all tests
- Remove lastal optional outputs from
taxon_filter.deplete_human
Fixed:
- In the Snakemake pipeline, code that reads sample sheets and barcode files is now more tolerant of different formats, including files formatted with Windows-style newlines (
\r\n
for Windows vs.\n
for Linux/Unix/macOS) - fixed handling of empty subtrees when importing
*.yaml
files within*.yaml
config files (for config includes/composition) - fixed other edge cases related to config imports
Upgraded:
- last
719
->876
- Update samtools to
1.5
- Update pysam to
0.12.0.1