Releases: nextstrain/nextclade
1.4.1
Nextclade Web 1.7.1, Nextclade CLI 1.4.1 (2021-10-05)
[Fix] Format of CSV/TSV output files
We fixed a few mistakes in CSV and TSV output files, such as missing last delimiter when the "errors" column is empty, inconsistent application of quotation marks and incorrect numeric formats - decimals when integers should be.
1.4.0
Nextclade Web 1.7.0, Nextclade CLI 1.4.0, Nextalign CLI 1.4.0 (2021-09-30)
[Feature] Frame shift detection
Nextclade now can detect reading frame shifts in the analyzed sequences and report them in the web interface as well as in the output files.
Background
Frame shift occurs when a sequence contains a range of indels (deletions and/or insertions) and the total length of this range is not divisible by 3. In this case the grouping of nucleotides into codons changes compared to the reference genome and the translation of this region manifests in the peptide as a range consisting almost entirely from aminoacid mutations.
Frame shifts can often be found towards the end of genes, spanning until or beyond the gene end. Sometimes, when indels occur in multiple places, the ones that follow can compensate (cancel) the frame shift caused by the previous ones, resulting in frame shift that spans a range in the middle of the gene. In these cases, due to extreme changes in the corresponding protein, the virus is often not viable, and are often a sign of sequencing errors, however, cases of biological frame shifts are also known. Sometimes, frame shifts can also introduce premature stop codons, causing the gene to be truncated. The premature stop codons within frame shifts are currently not (yet) detected by Nextclade.
Previous behavior
Previously, Nextclade was not able to detect frame shifts ranges specifically. Instead, a frame shift was suspected in a gene when the gene length was not divisible by 3 (hinting to indels of a total length not divisible by 3). In these cases the entire gene was omitted from translation, a warning was issued, and aminoacid changes in that gene could not be detected and reported.
New behavior
Now that Nextclade knows the exact shifted ranges for each gene, it translates the genes with frame shifts, but masks shifted regions with aminoacid X
(unknown aminoacid). The aminoacid changes in non-frame-shifted regions within such genes are now reported. This means that in some sequences Nextclade can now detect more mutations than previously. The affected genes are now emitted into the output fasta files instead of being discarded.
Frame shifts report in Nextclade Web
Frame shifted ranges are denoted as red horizontal (strikethrough) lines with yellow highlights in the "Sequence view" and "Gene view" columns of the results table of Nextclade Web. The new "FS" column shows number of detected frame shifts: unexpected and known (ignored) ones (see the QC changes below for more details).
Frame shifts report in the output files
Frame shifted ranges (in codon coordinates) are reported in CSV and TSV output files in column named frameShifts
and in JSON output file under frameShifts
property.
[Feature] Improved frame shift quality control (QC) rule
Previously, frame shift quality control rule (denoted as "F" in Nextclade Web) was relying on gene length to reason about the presence of frame shifts - if a gene had length not divisible by 3 - a warning was reported.
Now this rule uses the detected frame shift ranges to make the decision. There now can be more than one frame shift detection per gene and Nextclade now accounts for compensated frame shifts, which were previously undetected.
In the new implementation of the Frame Shift QC rule, some of the frame shift ranges are considered "ignored" or "known" (as defined in qc.json
file of the dataset). These frame shifts don't cause QC score penalty.
[Feature] New version of SARS-CoV-2 dataset
We simultaneously release a new version of SARS-CoV-2 dataset, which contains an updated tree and clades, as well as a new set of frame shift ranges and stop codons to ignore. For the details refer to the dataset changelog.
Nextclade Web uses the latest version of the datasets by default and CLI users are encouraged to update their SARS-CoV-2 dataset with the nextclade dataset get
command.
[Feature] Optional translation beyond first stop codon
By default Nextalign CLI and Nextclade CLI translate the whole genes, even if stop codons appear during translation. In this release we added a flag --no-translate-past-stop
, which if present, makes translation to stop on first encountered stop codon. The remainder of the peptide is the filled with gap (-
) character. This might be useful in some cases when a more biological behavior of translation is desired.
1.3.0
Nextclade Web 1.6.0, Nextclade CLI 1.3.0, Nextalign CLI 1.3.0 (2021-08-31)
[Feature] Nextclade Datasets
In this release we introduce Nextclade Datasets, a convenient way of downloading files required for Nextclade analysis. Now data files (such as reference sequences, reference tree and others) are served for all users from a central dataset repository.
Datasets in Nextclade Web
The dropdown menu in Nextclade Web now allows user to chose between available datasets before analysis and automatically fetches the latest files from the central dataset repository.
Datasets in Nextclade CLI
Nextclade CLI gained new commands and flags to manage datasets:
nextclade dataset list
command allows to list available datasetsnextclade dataset get
command allows to download a dataset to a directorynextclade run
command runs the analysis (for compatibility with old version the wordrun
can be omitted) and the new--input-dataset
flag allows to specify the directory of the previously downloaded dataset
Quick example
nextclade dataset get --name=sars-cov-2 --output-dir=data/sars-cov-2
nextclade run \
--input-fasta=data/sars-cov-2/sequences.fasta \
--input-dataset=data/sars-cov-2 \
--output-tsv=output/nextclade.tsv \
--output-tree=output/nextclade.auspice.json \
--output-dir=output/
See Nextclade CLI documentation for example usage and Nextclade Datasets documentation for more details about datasets.
Note, data updates and additions are now decoupled from Nextclade releases. The datasets will be updated independently. Read datasets documentation on dataset versioning and a tradeoff between reproducibility or results vs lastest features (e.g. clades and QC checks).
[Feature] Flu datasets in Nextclade
With this release, additionally to the previously available SARS-CoV-2 dataset, we introduce 4 new Influenza datasets:
- Influenza A H1N1pdm (rooted at "A/California/07/2009")
- Influenza A H3N2 (rooted at "A/Wisconsin/67/2005")
- Influenza B Victoria (rooted at "B/Brisbane/60/2008")
- Influenza B Yamagata (rooted at "B/Wisconsin/01/2010")
These datasets allow Nextclade to analyze sequences for these pathogens.
Nextclade Datasets feature simplifies adding new pathogens in Nextclade and we hope to add new datasets in the future.
[Deprecation] Data files in Nextclade GitHub repository are deprecated
The files in /data
directory of the Nextclade GitHub repository are now deprecated in favor of Nextclade Datasets feature.
These files will be deleted from repository on October 31st 2021, but will be still available in git history. We do not recommend to use these files, as they will no longer be updated.
1.2.3
Nextclade CLI 1.2.3, Nextalign CLI 1.2.3 (2021-08-12)
This release only affects docker images. There are no actual changes in Nextclade CLI, Nextalign CLI or Nextclade Web. They should behave the same as their previous versions.
[Change] Add ca-certificates
package into Debian docker images
For better compatibility with workflows, this adds CA certificates into the Debian docker images. They are necessary for SSL/TLS to be working, in particular when fetching data.
These are the default images when you pull nextstrain/nextclade
and nextstrain/nextalign
without specifying a tag or specifying one of the debian
tags. Issue docker pull nextstrain/nextclade
to refresh the local image to the latest version.
1.2.2
Nextclade CLI 1.2.2, Nextalign CLI 1.2.2 (2021-08-12)
This release only affects docker images. There are no actual changes in Nextclade CLI, Nextalign CLI or Nextclade Web. They should behave the same as their previous versions.
[Change] Add ps
utility into Debian docker images
This adds ps
utility into the Debian docker images. For better compatibility with nextflow workflows.
These are the default images when you pull nextstrain/nextclade
and nextstrain/nextalign
without specifying a tag or specifying one of the debian
tags.
1.2.1
Nextclade Web 1.5.3, Nextclade CLI 1.2.1, Nextalign CLI 1.2.1
[Bug fix] Incorrect ranges in "SNP clusters" QC rule
"SNP clusters" QC rule could sometimes produce ranges of SNP clusters with incorrect boundaries (begin/end). This is now fixed.
[Bug fix] Crash with incorrect colorings in the input reference tree
Fixed a rare crash in Nextclade CLI and Nextclade Web when input reference tree contained incorrect fields in "colorings" section of the tree JSON file.
[Change] Cleanup the tree node info dialog
Removed redundant text entries in the tree node info dialog (when clicking on a node in the tree view). All these entries are still presented in the results table.
[Change] Cleanup the tree node info dialog
Improved wording of the message in the "Private mutations" QC rule tooltip.
[Change] New docker container images for Nextclade CLI and Nextalign CLI
New Docker images are available based on Debian 10 and Alpine 3.14. Debian images contain a set of basic utilities, such as bash
, curl
and wget
, to facilitate usage in workflows.
You can choose to use the latest available version (:latest
or no tag), or to freeze a specific version (e.g. :1.2.1
) or only major version (e.g. :1
), or a base image (e.g. :debian
) or both version and base image (:1.2.1-debian
), or mix and match.
Tag :latest
now points to :debian
. For previous behavior, where :latest
tag pointed to FROM scratch
image, use tag :scratch
.
Full list of tags is below.
Image based on Debian 10 is tagged:
nextstrain/nextclade
nextstrain/nextclade:latest
nextstrain/nextclade:1
nextstrain/nextclade:1.2.1
nextstrain/nextclade:debian
nextstrain/nextclade:latest-debian
nextstrain/nextclade:1-debian
nextstrain/nextclade:1.2.1-debian
Image based on Alpine 3.14 tagged:
nextstrain/nextclade:alpine
nextstrain/nextclade:latest-alpine
nextstrain/nextclade:1-alpine
nextstrain/nextclade:1.2.1-alpine
Previously default FROM scratch
image is tagged:
nextstrain/nextclade:scratch
nextstrain/nextclade:latest-scratch
nextstrain/nextclade:1-scratch
nextstrain/nextclade:1.2.1-scratch
1.2.0
Nextclade Web 1.4.0, Nextclade CLI 1.2.0, Nextalign CLI 1.2.0 (2021-06-24)
Nextclade Web and Nextclade CLI
[New feature] Quality control (QC) rules: "Frame shifts" (F) and "Stop codons" (S)
We have added two additional QC rules designed to flag sequences that likely do not correspond to functional viruses.
"Stop codons" rule (S)
Checks if any of genes have premature stop codons. A stop codon within a gene will now result in a QC warning, unless it is one of the very common stop codons in ORF8 at positions 27 or 68. This list of ignored stop codons is defined in the stopCodons.ignoredStopCodons
property of the QC configuration file (qc.json
) and can be adjusted. The default list might be extended in the future.
Results of this check are available in JSON, CSV, and TSV output files as qc.stopCodons
. In Nextclade Web it is displayed in the "QC" column of the results table as a circle with letter "S" in it.
"Frame shifts" rule (F)
Checks and reports if any of the genes have a length that is not divisible by 3. If at least one such gene length is detected, the check is considered "bad". Failure of this check means that the gene likely fails to translate.
Results of this check are available in JSON, CSV, and TSV output files as qc.frameShifts
. In Nextclade Web it is displayed in the "QC" column of the results table as a circle with letter "F" in it.
[Change] Quality control (QC) configuration file updated
New entries were added to the QC configuration file (qc.json
) for the two new rules. For Nextclade CLI users, we recommend to download the new file from our data/
directory on GitHub.
This file is now versioned using the new schemaVersion
property. If the version of qc.json
is less than the version of Nextclade CLI itself, users will now receive a warning.
All QC checks are now optional: a rule that has no corresponding config object is automatically disabled.
[Bug fix] CSV/TSV output files corrected
This release corrects a few issues with CSV/TSV output files:
- quotation marks are now escaped correctly
- special characters are now surrounded with quotes
- line breaks are now encoded as
CR LF
for better compatibility and consistency with Nextclade 0.x - column shifts are now prevented in CSV/TSV results when some of the QC checks are disabled, as disabled checks return empty strings as result
Nextclade Web
[Bug fix] Ranges displayed off-by-one in GUI
Ranges displayed in Nextclade Web were off-by-one due to a front-end bug. Ends of ranges (right boundaries) were extending one unit too far. This means that alignment ranges, missing nucleotide ranges, ranges of gaps, not-sequenced ranges, were all displayed 1 unit longer than they should have been be. This release fixes this problem.
Only the display in the results table of Nextclade Web is affected. None of the output files, either produced by Nextclade CLI or by Nextclade Web are affected.
[New feature] Insertions displayed in the results table
A new column for insertions (abbreviated as "Ins.") was added to the results table of Nextclade Web. It shows the total number of inserted nucleotides. Hovering reveals more details about each insertion. This information was already available in the output files, and is now also shown in the GUI.
Nextalign CLI
There are no changes in Nextalign in this release, but we keep versions of Nextalign and Nextclade in sync.
1.1.0
Nextclade Web 1.3.0, Nextclade CLI 1.1.0, Nextalign CLI 1.1.0 (2021-06-22)
This series of releases adds the new output file, nextclade.errors.csv
for all tools and adds the file nextclade.insertions.csv
to Nextclade Web (this file was already available for users of CLI tools).
nextclade.insertions.csv
contains the following columns: seqName
, insertions
. The column insertions
contains a list of nucleotide insertion entries delimited by semicolon. Each entry consists of the position of the first nucleotide and the inserted fragment, delimited by colon.
nextclade.errors.csv
: includes columns seqName
, errors
, warnings
, failedGenes
, which contains list of errors, list of warnings and list of genes that failed processing. All lists are semicolon-delimited.
In both files, each row corresponds to one sequence, named by seqName
.
1.0.0
This major release brings many new features and bug fixes.
We release new versions of all of the tools in Nextclade family: Nextclade web application, Nextclade CLI and Nextalign CLI.
With this major release we introduce breaking changes. In particular, changes to input and output file formats as well as to arguments of command-line tools. The breaking changes are marked with "💥 BREAKING CHANGE" prefix. It is recommended to review these changes.
Below is a description of changes compared to version 0.14.4.
General
Changes that affect all tools:
-
The underlying algorithm has been completely rewritten in C++ (versions 0.x were implemented in JavaScript), to make it faster, more reliable and to produce better results. Web application now uses WebAssembly modules to be able to run the algorithm.
-
💥 BREAKING CHANGE: Nextclade now uses Nextalign algorithm for the alignment and translation of sequences. This means that nucleotide alignment is now aware of codon boundaries. Alignment results and some of the analysis results might be slightly different, depending on input sequences.
-
Similarly to Nextalign, Nextclade can now output aligned peptides. In general, Nextclade is a superset of Nextalign and can do everything Nextalign can, plus more (for the price of additional computation).
-
💥 BREAKING CHANGE: Gene maps are now only accepted in GFF3 format. See an example at GitHub. Migration path: use provided default gene map or convert your custom gene map to GFF3 format.
-
💥 BREAKING CHANGE: JSON results file format has changed. It now contains an object instead of an array as a root element. The array of results is now attached to the
results
property of the root object. Migration path: instead of usingoutput
array directly useoutput.results
now. -
💥 BREAKING CHANGE: JSON fields and CSV/TSV columns
totalMutations
andtotalGaps
were renamed tototalSubstitutions
andtotalDeletions
, for consistency. Migration path: use new JSON property or column names.
Nextclade web application v1
Web application mostly maintains it previous interface, with small improvements and with adjustments to the new underlying algorithm implementation.
-
New "Download" dialog was introduced, which replaces the old "Export" dropdown menu. It can be toggled by clicking on "Download" button on "results" page.
-
Aligned sequences now can be downloaded in the new "Download" dialog.
-
Translated aligned peptides now can be downloaded in the new "Download" dialog.
-
"Sequence view" column of the results table now can be switched between "Nucleotide sequence" view and "Gene" view. In "Gene" view, aminoacid mutations and deletions are displayed for a particular gene.
-
"Sequence view" can also be switched by clicking on a gene in "Genome annotation" panel below the results table.
-
Results table tooltips has been cleaned up, information was spread between corresponding columns, in order to fit the tooltips fully to common screen sizes. For example, list of mutations is now only available when mouse over the "Mut." column.
-
The tooltips to explore diversity have become much more informative. For amino acid changes, we now provide a nucleotide context view that is particularly helpful for complex mutations. Consecutive changes are merged into one tooltip.
Nextclade CLI v1
Nextclade CLI v1 is a replacement for Nextclade CLI v0. It is recommended for advanced users, batch processing and for integration into pipelines.
-
Nextclade CLI 1.0.0 is available on GitHub Releases and on DockerHub:
-
Node.js is no longer required. Nextclade is now distributed as a standalone native executable file and is ready to be used after download. The latest version is available for major platforms at Github Releases page.
-
The limitation of Node.js on maximum input file size (500 MB) is now removed. Nextclade should be able to handle large files and to use I/O resources more efficiently. Nextclade will stream sequence data to reduce memory consumption.
-
Nextclade CLI is much faster now. Depending on conditions, we measured speedups up to 5x compared to the old implementation.
-
💥 BREAKING CHANGE: Nextclade no longer includes any default data. The following flags for input files were previously optional but are now required:
--input-root-seq
,--input-tree
,--input-qc-config
. The--input-gene-map
flag is optional, but is highly recommended, because without gene map, the alignment will not be informed by codon boundaries and translation, peptide output and aminoacid change detection will not be available. The example SARS-CoV-2 data can be downloaded from GitHub and used as a starting point. Refer to built-in help for more details (--help
). Migration path: download the default data add new flags if you you were previously not using them. -
💥 BREAKING CHANGE: Reference (root) sequence is no longer being written into outputs by default. Add
--include-reference
flag to include it. Reference peptides will also be included in this case. Migration path: use the mentioned flag if you need reference sequence results included into the outputs. -
💥 BREAKING CHANGE: Nextclade might write aligned sequences into output files in the order that is different from the order of sequences in the input file. If order is important, use flag
--in-order
to enforce the initial order of sequences. This results in a small runtime performance penalty. Refer to built-in help for more details (--help
). Migration path: use the mentioned flag if you need results to be written in order.
Nextalign CLI v1
Nextalign is a new tool that contains only the alignment and translation part of the algorithm, without sequence analysis, quality control, tree placement or other features of Nextclade (making it faster). It is available on Github Releases page. Refer to built-in help for more details (--help
).
Deprecation of Nextclade CLI v0
-
Nextclade CLI 0.x is now deprecated and not recommended for general use. We recommend all users to migrate to version 1.x. Old versions will still be available on NPM and Docker Hub, but there are no plans to release new versions. Please reach out to developers if you still need support for versions 0.x.
-
Container images hosted on Docker Hub will now resolve to Nextclade family v1. In order to pull the version of family 0.x, use tag
:0
or a full version explicitly, for example:0.14.4
:
docker pull nextstrain/nextclade:0
docker pull nextstrain/nextclade:0.14.4
We hope you enjoy the new release and as always, don't hesitate to reach out to Nextstrain team on Nextstrain discussion forums or on GitHub.
1.0.0-alpha.9
Nextclade CLI
Calculation of the summary QC score from the scores of the individual rules missed a numerical factor that resulted in many good quality sequences being assigned a poor QC score. This version fixes that.
Nextalign CLI
There are no changes in Nextalign, but we keep versions of Nextalign and Nextclade in sync.