Skip to content

Releases: nextstrain/nextclade

1.4.1

05 Oct 18:31
Compare
Choose a tag to compare

Nextclade Web 1.7.1, Nextclade CLI 1.4.1 (2021-10-05)

[Fix] Format of CSV/TSV output files

We fixed a few mistakes in CSV and TSV output files, such as missing last delimiter when the "errors" column is empty, inconsistent application of quotation marks and incorrect numeric formats - decimals when integers should be.

1.4.0

30 Sep 12:24
Compare
Choose a tag to compare

Nextclade Web 1.7.0, Nextclade CLI 1.4.0, Nextalign CLI 1.4.0 (2021-09-30)

[Feature] Frame shift detection

Nextclade now can detect reading frame shifts in the analyzed sequences and report them in the web interface as well as in the output files.

Background

Frame shift occurs when a sequence contains a range of indels (deletions and/or insertions) and the total length of this range is not divisible by 3. In this case the grouping of nucleotides into codons changes compared to the reference genome and the translation of this region manifests in the peptide as a range consisting almost entirely from aminoacid mutations.

Frame shifts can often be found towards the end of genes, spanning until or beyond the gene end. Sometimes, when indels occur in multiple places, the ones that follow can compensate (cancel) the frame shift caused by the previous ones, resulting in frame shift that spans a range in the middle of the gene. In these cases, due to extreme changes in the corresponding protein, the virus is often not viable, and are often a sign of sequencing errors, however, cases of biological frame shifts are also known. Sometimes, frame shifts can also introduce premature stop codons, causing the gene to be truncated. The premature stop codons within frame shifts are currently not (yet) detected by Nextclade.

Previous behavior

Previously, Nextclade was not able to detect frame shifts ranges specifically. Instead, a frame shift was suspected in a gene when the gene length was not divisible by 3 (hinting to indels of a total length not divisible by 3). In these cases the entire gene was omitted from translation, a warning was issued, and aminoacid changes in that gene could not be detected and reported.

New behavior

Now that Nextclade knows the exact shifted ranges for each gene, it translates the genes with frame shifts, but masks shifted regions with aminoacid X (unknown aminoacid). The aminoacid changes in non-frame-shifted regions within such genes are now reported. This means that in some sequences Nextclade can now detect more mutations than previously. The affected genes are now emitted into the output fasta files instead of being discarded.

Frame shifts report in Nextclade Web

Frame shifted ranges are denoted as red horizontal (strikethrough) lines with yellow highlights in the "Sequence view" and "Gene view" columns of the results table of Nextclade Web. The new "FS" column shows number of detected frame shifts: unexpected and known (ignored) ones (see the QC changes below for more details).

Frame shifts report in the output files

Frame shifted ranges (in codon coordinates) are reported in CSV and TSV output files in column named frameShifts and in JSON output file under frameShifts property.

[Feature] Improved frame shift quality control (QC) rule

Previously, frame shift quality control rule (denoted as "F" in Nextclade Web) was relying on gene length to reason about the presence of frame shifts - if a gene had length not divisible by 3 - a warning was reported.

Now this rule uses the detected frame shift ranges to make the decision. There now can be more than one frame shift detection per gene and Nextclade now accounts for compensated frame shifts, which were previously undetected.

In the new implementation of the Frame Shift QC rule, some of the frame shift ranges are considered "ignored" or "known" (as defined in qc.json file of the dataset). These frame shifts don't cause QC score penalty.

[Feature] New version of SARS-CoV-2 dataset

We simultaneously release a new version of SARS-CoV-2 dataset, which contains an updated tree and clades, as well as a new set of frame shift ranges and stop codons to ignore. For the details refer to the dataset changelog.

Nextclade Web uses the latest version of the datasets by default and CLI users are encouraged to update their SARS-CoV-2 dataset with the nextclade dataset get command.

[Feature] Optional translation beyond first stop codon

By default Nextalign CLI and Nextclade CLI translate the whole genes, even if stop codons appear during translation. In this release we added a flag --no-translate-past-stop, which if present, makes translation to stop on first encountered stop codon. The remainder of the peptide is the filled with gap (-) character. This might be useful in some cases when a more biological behavior of translation is desired.

1.3.0

31 Aug 08:16
Compare
Choose a tag to compare

Nextclade Web 1.6.0, Nextclade CLI 1.3.0, Nextalign CLI 1.3.0 (2021-08-31)

[Feature] Nextclade Datasets

In this release we introduce Nextclade Datasets, a convenient way of downloading files required for Nextclade analysis. Now data files (such as reference sequences, reference tree and others) are served for all users from a central dataset repository.

Datasets in Nextclade Web

The dropdown menu in Nextclade Web now allows user to chose between available datasets before analysis and automatically fetches the latest files from the central dataset repository.

Datasets in Nextclade CLI

Nextclade CLI gained new commands and flags to manage datasets:

  • nextclade dataset list command allows to list available datasets
  • nextclade dataset get command allows to download a dataset to a directory
  • nextclade run command runs the analysis (for compatibility with old version the word run can be omitted) and the new --input-dataset flag allows to specify the directory of the previously downloaded dataset
Quick example
nextclade dataset get --name=sars-cov-2 --output-dir=data/sars-cov-2

nextclade run \
  --input-fasta=data/sars-cov-2/sequences.fasta \
  --input-dataset=data/sars-cov-2 \
  --output-tsv=output/nextclade.tsv \
  --output-tree=output/nextclade.auspice.json \
  --output-dir=output/

See Nextclade CLI documentation for example usage and Nextclade Datasets documentation for more details about datasets.

Note, data updates and additions are now decoupled from Nextclade releases. The datasets will be updated independently. Read datasets documentation on dataset versioning and a tradeoff between reproducibility or results vs lastest features (e.g. clades and QC checks).

[Feature] Flu datasets in Nextclade

With this release, additionally to the previously available SARS-CoV-2 dataset, we introduce 4 new Influenza datasets:

  • Influenza A H1N1pdm (rooted at "A/California/07/2009")
  • Influenza A H3N2 (rooted at "A/Wisconsin/67/2005")
  • Influenza B Victoria (rooted at "B/Brisbane/60/2008")
  • Influenza B Yamagata (rooted at "B/Wisconsin/01/2010")

These datasets allow Nextclade to analyze sequences for these pathogens.

Nextclade Datasets feature simplifies adding new pathogens in Nextclade and we hope to add new datasets in the future.

[Deprecation] Data files in Nextclade GitHub repository are deprecated

The files in /data directory of the Nextclade GitHub repository are now deprecated in favor of Nextclade Datasets feature.

These files will be deleted from repository on October 31st 2021, but will be still available in git history. We do not recommend to use these files, as they will no longer be updated.

1.2.3

12 Aug 15:55
Compare
Choose a tag to compare

Nextclade CLI 1.2.3, Nextalign CLI 1.2.3 (2021-08-12)

This release only affects docker images. There are no actual changes in Nextclade CLI, Nextalign CLI or Nextclade Web. They should behave the same as their previous versions.

[Change] Add ca-certificates package into Debian docker images

For better compatibility with workflows, this adds CA certificates into the Debian docker images. They are necessary for SSL/TLS to be working, in particular when fetching data.

These are the default images when you pull nextstrain/nextclade and nextstrain/nextalign without specifying a tag or specifying one of the debian tags. Issue docker pull nextstrain/nextclade to refresh the local image to the latest version.

1.2.2

12 Aug 12:21
Compare
Choose a tag to compare

Nextclade CLI 1.2.2, Nextalign CLI 1.2.2 (2021-08-12)

This release only affects docker images. There are no actual changes in Nextclade CLI, Nextalign CLI or Nextclade Web. They should behave the same as their previous versions.

[Change] Add ps utility into Debian docker images

This adds ps utility into the Debian docker images. For better compatibility with nextflow workflows.

These are the default images when you pull nextstrain/nextclade and nextstrain/nextalign without specifying a tag or specifying one of the debian tags.

1.2.1

10 Aug 22:10
Compare
Choose a tag to compare

Nextclade Web 1.5.3, Nextclade CLI 1.2.1, Nextalign CLI 1.2.1

[Bug fix] Incorrect ranges in "SNP clusters" QC rule

"SNP clusters" QC rule could sometimes produce ranges of SNP clusters with incorrect boundaries (begin/end). This is now fixed.

[Bug fix] Crash with incorrect colorings in the input reference tree

Fixed a rare crash in Nextclade CLI and Nextclade Web when input reference tree contained incorrect fields in "colorings" section of the tree JSON file.

[Change] Cleanup the tree node info dialog

Removed redundant text entries in the tree node info dialog (when clicking on a node in the tree view). All these entries are still presented in the results table.

[Change] Cleanup the tree node info dialog

Improved wording of the message in the "Private mutations" QC rule tooltip.

[Change] New docker container images for Nextclade CLI and Nextalign CLI

New Docker images are available based on Debian 10 and Alpine 3.14. Debian images contain a set of basic utilities, such as bash, curl and wget, to facilitate usage in workflows.

You can choose to use the latest available version (:latest or no tag), or to freeze a specific version (e.g. :1.2.1) or only major version (e.g. :1), or a base image (e.g. :debian) or both version and base image (:1.2.1-debian), or mix and match.

Tag :latest now points to :debian. For previous behavior, where :latest tag pointed to FROM scratch image, use tag :scratch.

Full list of tags is below.

Image based on Debian 10 is tagged:

nextstrain/nextclade
nextstrain/nextclade:latest
nextstrain/nextclade:1
nextstrain/nextclade:1.2.1

nextstrain/nextclade:debian
nextstrain/nextclade:latest-debian
nextstrain/nextclade:1-debian
nextstrain/nextclade:1.2.1-debian

Image based on Alpine 3.14 tagged:

nextstrain/nextclade:alpine
nextstrain/nextclade:latest-alpine
nextstrain/nextclade:1-alpine
nextstrain/nextclade:1.2.1-alpine

Previously default FROM scratch image is tagged:

nextstrain/nextclade:scratch
nextstrain/nextclade:latest-scratch
nextstrain/nextclade:1-scratch
nextstrain/nextclade:1.2.1-scratch

1.2.0

24 Jun 19:28
Compare
Choose a tag to compare

Nextclade Web 1.4.0, Nextclade CLI 1.2.0, Nextalign CLI 1.2.0 (2021-06-24)

Nextclade Web and Nextclade CLI

[New feature] Quality control (QC) rules: "Frame shifts" (F) and "Stop codons" (S)

We have added two additional QC rules designed to flag sequences that likely do not correspond to functional viruses.

"Stop codons" rule (S)

Checks if any of genes have premature stop codons. A stop codon within a gene will now result in a QC warning, unless it is one of the very common stop codons in ORF8 at positions 27 or 68. This list of ignored stop codons is defined in the stopCodons.ignoredStopCodons property of the QC configuration file (qc.json) and can be adjusted. The default list might be extended in the future.

Results of this check are available in JSON, CSV, and TSV output files as qc.stopCodons. In Nextclade Web it is displayed in the "QC" column of the results table as a circle with letter "S" in it.

"Frame shifts" rule (F)

Checks and reports if any of the genes have a length that is not divisible by 3. If at least one such gene length is detected, the check is considered "bad". Failure of this check means that the gene likely fails to translate.

Results of this check are available in JSON, CSV, and TSV output files as qc.frameShifts. In Nextclade Web it is displayed in the "QC" column of the results table as a circle with letter "F" in it.

[Change] Quality control (QC) configuration file updated

New entries were added to the QC configuration file (qc.json) for the two new rules. For Nextclade CLI users, we recommend to download the new file from our data/ directory on GitHub.

This file is now versioned using the new schemaVersion property. If the version of qc.json is less than the version of Nextclade CLI itself, users will now receive a warning.

All QC checks are now optional: a rule that has no corresponding config object is automatically disabled.

[Bug fix] CSV/TSV output files corrected

This release corrects a few issues with CSV/TSV output files:

  • quotation marks are now escaped correctly
  • special characters are now surrounded with quotes
  • line breaks are now encoded as CR LF for better compatibility and consistency with Nextclade 0.x
  • column shifts are now prevented in CSV/TSV results when some of the QC checks are disabled, as disabled checks return empty strings as result

Nextclade Web

[Bug fix] Ranges displayed off-by-one in GUI

Ranges displayed in Nextclade Web were off-by-one due to a front-end bug. Ends of ranges (right boundaries) were extending one unit too far. This means that alignment ranges, missing nucleotide ranges, ranges of gaps, not-sequenced ranges, were all displayed 1 unit longer than they should have been be. This release fixes this problem.

Only the display in the results table of Nextclade Web is affected. None of the output files, either produced by Nextclade CLI or by Nextclade Web are affected.

[New feature] Insertions displayed in the results table

A new column for insertions (abbreviated as "Ins.") was added to the results table of Nextclade Web. It shows the total number of inserted nucleotides. Hovering reveals more details about each insertion. This information was already available in the output files, and is now also shown in the GUI.

Nextalign CLI

There are no changes in Nextalign in this release, but we keep versions of Nextalign and Nextclade in sync.

1.1.0

22 Jun 13:28
Compare
Choose a tag to compare

Nextclade Web 1.3.0, Nextclade CLI 1.1.0, Nextalign CLI 1.1.0 (2021-06-22)

This series of releases adds the new output file, nextclade.errors.csv for all tools and adds the file nextclade.insertions.csv to Nextclade Web (this file was already available for users of CLI tools).

nextclade.insertions.csv contains the following columns: seqName, insertions. The column insertions contains a list of nucleotide insertion entries delimited by semicolon. Each entry consists of the position of the first nucleotide and the inserted fragment, delimited by colon.

nextclade.errors.csv: includes columns seqName, errors, warnings, failedGenes, which contains list of errors, list of warnings and list of genes that failed processing. All lists are semicolon-delimited.

In both files, each row corresponds to one sequence, named by seqName.

1.0.0

11 Jun 00:08
Compare
Choose a tag to compare

This major release brings many new features and bug fixes.

We release new versions of all of the tools in Nextclade family: Nextclade web application, Nextclade CLI and Nextalign CLI.

With this major release we introduce breaking changes. In particular, changes to input and output file formats as well as to arguments of command-line tools. The breaking changes are marked with "💥 BREAKING CHANGE" prefix. It is recommended to review these changes.

Below is a description of changes compared to version 0.14.4.

General

Changes that affect all tools:

  • The underlying algorithm has been completely rewritten in C++ (versions 0.x were implemented in JavaScript), to make it faster, more reliable and to produce better results. Web application now uses WebAssembly modules to be able to run the algorithm.

  • 💥 BREAKING CHANGE: Nextclade now uses Nextalign algorithm for the alignment and translation of sequences. This means that nucleotide alignment is now aware of codon boundaries. Alignment results and some of the analysis results might be slightly different, depending on input sequences.

  • Similarly to Nextalign, Nextclade can now output aligned peptides. In general, Nextclade is a superset of Nextalign and can do everything Nextalign can, plus more (for the price of additional computation).

  • 💥 BREAKING CHANGE: Gene maps are now only accepted in GFF3 format. See an example at GitHub. Migration path: use provided default gene map or convert your custom gene map to GFF3 format.

  • 💥 BREAKING CHANGE: JSON results file format has changed. It now contains an object instead of an array as a root element. The array of results is now attached to the results property of the root object. Migration path: instead of using output array directly use output.results now.

  • 💥 BREAKING CHANGE: JSON fields and CSV/TSV columns totalMutations and totalGaps were renamed to totalSubstitutions and totalDeletions, for consistency. Migration path: use new JSON property or column names.

Nextclade web application v1

Web application mostly maintains it previous interface, with small improvements and with adjustments to the new underlying algorithm implementation.

  • New "Download" dialog was introduced, which replaces the old "Export" dropdown menu. It can be toggled by clicking on "Download" button on "results" page.

  • Aligned sequences now can be downloaded in the new "Download" dialog.

  • Translated aligned peptides now can be downloaded in the new "Download" dialog.

  • "Sequence view" column of the results table now can be switched between "Nucleotide sequence" view and "Gene" view. In "Gene" view, aminoacid mutations and deletions are displayed for a particular gene.

  • "Sequence view" can also be switched by clicking on a gene in "Genome annotation" panel below the results table.

  • Results table tooltips has been cleaned up, information was spread between corresponding columns, in order to fit the tooltips fully to common screen sizes. For example, list of mutations is now only available when mouse over the "Mut." column.

  • The tooltips to explore diversity have become much more informative. For amino acid changes, we now provide a nucleotide context view that is particularly helpful for complex mutations. Consecutive changes are merged into one tooltip.

Nextclade CLI v1

Nextclade CLI v1 is a replacement for Nextclade CLI v0. It is recommended for advanced users, batch processing and for integration into pipelines.

  • Nextclade CLI 1.0.0 is available on GitHub Releases and on DockerHub:

  • Node.js is no longer required. Nextclade is now distributed as a standalone native executable file and is ready to be used after download. The latest version is available for major platforms at Github Releases page.

  • The limitation of Node.js on maximum input file size (500 MB) is now removed. Nextclade should be able to handle large files and to use I/O resources more efficiently. Nextclade will stream sequence data to reduce memory consumption.

  • Nextclade CLI is much faster now. Depending on conditions, we measured speedups up to 5x compared to the old implementation.

  • 💥 BREAKING CHANGE: Nextclade no longer includes any default data. The following flags for input files were previously optional but are now required: --input-root-seq, --input-tree, --input-qc-config. The --input-gene-map flag is optional, but is highly recommended, because without gene map, the alignment will not be informed by codon boundaries and translation, peptide output and aminoacid change detection will not be available. The example SARS-CoV-2 data can be downloaded from GitHub and used as a starting point. Refer to built-in help for more details (--help). Migration path: download the default data add new flags if you you were previously not using them.

  • 💥 BREAKING CHANGE: Reference (root) sequence is no longer being written into outputs by default. Add --include-reference flag to include it. Reference peptides will also be included in this case. Migration path: use the mentioned flag if you need reference sequence results included into the outputs.

  • 💥 BREAKING CHANGE: Nextclade might write aligned sequences into output files in the order that is different from the order of sequences in the input file. If order is important, use flag --in-order to enforce the initial order of sequences. This results in a small runtime performance penalty. Refer to built-in help for more details (--help). Migration path: use the mentioned flag if you need results to be written in order.

Nextalign CLI v1

Nextalign is a new tool that contains only the alignment and translation part of the algorithm, without sequence analysis, quality control, tree placement or other features of Nextclade (making it faster). It is available on Github Releases page. Refer to built-in help for more details (--help).

Deprecation of Nextclade CLI v0

  • Nextclade CLI 0.x is now deprecated and not recommended for general use. We recommend all users to migrate to version 1.x. Old versions will still be available on NPM and Docker Hub, but there are no plans to release new versions. Please reach out to developers if you still need support for versions 0.x.

  • Container images hosted on Docker Hub will now resolve to Nextclade family v1. In order to pull the version of family 0.x, use tag :0 or a full version explicitly, for example :0.14.4:

docker pull nextstrain/nextclade:0
docker pull nextstrain/nextclade:0.14.4

We hope you enjoy the new release and as always, don't hesitate to reach out to Nextstrain team on Nextstrain discussion forums or on GitHub.

1.0.0-alpha.9

03 Jun 18:09
Compare
Choose a tag to compare

Nextclade CLI

Calculation of the summary QC score from the scores of the individual rules missed a numerical factor that resulted in many good quality sequences being assigned a poor QC score. This version fixes that.

Nextalign CLI

There are no changes in Nextalign, but we keep versions of Nextalign and Nextclade in sync.