All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- The workflow failing when there were tab characters in the FASTA header lines of reference sequences.
- The way the reference sequence IDs are sanitised to prevent issues with special characters.
- The workflow failing when there was a whitespace in the name of the reference file.
- The report now consistently uses the sanitised reference sequence IDs throughout.
- Some formatting changes to github issue template.
- Incorrect CPU use specification for the
medakaVariant
process.
- Default for
--reads_downsampling_size
to 1500 to limit memory usage. - Default for
--medaka_target_depth_per_strand
to 150 as the workflow now supports longer amplicons.
- The workflow failing when a sample had only a single read.
- Memory requirements for each process.
- Reworked docs to follow new layout.
- The de-novo QC stage failing when not a single input read re-aligns against the draft consensus.
- More informative warnings for failed samples in the report intro section.
- Assembly step to de-novo consensus mode to handle longer amplicons. README was updated accordingly.
- Default local executor CPU and RAM limits
- Names of barcoded directories in the sample sheet now need to be of format
barcodeXY
.
- Support for specifying reference sequences for individual samples with an extra
"ref"
column in the sample sheet.
- The workflow now also emits VCF index files as well as BAM / VCF index files for combined outputs when running with
--combine_results
.
- Emitting an empty consensus FASTA file when consensus generation failed in certain situations. Instead, no file is emitted.
- Now uses the downsampled BAM for
medaka annotate
.
- Misleading allelic balance statistic in report.
- Running the workflow without a reference will switch the workflow to "no-reference mode" and will use SPOA to construct a consensus sequence de novo.
- The workflow now also outputs BAM index files.
- The workflow now downsamples reads for each amplicon to be in the suitable depth range for Medaka.
- MacOS ARM64 support.
- Updated Medaka to 1.9.1.
- No longer publishes empty result files (BAM, VCF, consensus FASTA) for samples which do not have any reads left after pre-processing and filtering.
- Now uses Medaka v1.8.2. Options for
basecaller_cfg
were updated accordingly. The default now is[email protected]
. - The per-sample summary table in the report no longer shows sample metadata columns unless metadata was provided by the user via a sample sheet.
- VCF files now use the sample alias as sample name instead of
SAMPLE
. - The reference FASTA file with sanitized sequence headers (with
:
,*
, and whitespace replaced with_
) which is used by the workflow internally due to some tools not tolerating these symbols in the sequence IDs is now also published alongside the other results.
- Parameter
--combine_results
to also output merged BAM and VCF files.
- The workflow now prints a warning when there are no reads after filtering / preprocessing and produces a truncated report showing only the pre-processing stats table.
- Bug where the
mosdepth
process would fail if the length of a reference sequence was smaller than the number of depth windows requested.
- GitHub issue templates
- Example command to use demo data.
- Instead of dropping variants with
DP < min_coverage
, set theirFILTER
column toLOW_DEPTH
in the results VCF. - Bumped minimum required Nextflow version to 22.10.8.
- Enum choices are enumerated in the
--help
output. - Enum choices are enumerated as part of the error message when a user has selected an invalid choice.
- Replaced
--threads
option infastqingress.nf
with hardcoded values to remove warning about undefinedparam.threads
.
- Consensus sequences and BAM files as output files.
- Configuration for running demo data in AWS.
- Tags for
epi2melabs
desktop app.
- Reference not being required by the schema.
- Bug in documentation that prevented blog post from building.
- First release.