Releases: nextstrain/nextclade
2.1.0
Nextclade CLI 2.1.0
-
Fix #907: If
--ouput-basename
contains dots, the last component is no longer omitted (report: @KatSteinke, fix: @ivan-aksamentov) -
Fix #908: Files passed as
--input-virus-properties
were interpreted like passed to--input-pcr-primers
and vice versa (report: @BCArg, fix: @corneliusroemer)
Commit history
(click to expand)
-
[
9e98a42
] docs: fix download links [skip ci] -
[
46a8a53
] chore: update dependencies, rust to 1.61 -
[
2290adc
] chore: upgrade Cargo.toml's -
[
bb9f276
] Merge pull request #905 from nextstrain/update-deps -
[
6935a9a
] Fix link to bioconda package -
[
cff9f9f
] Merge pull request #906 from nextstrain/victorlin/fix-doc-link -
[
29a92cb
] fix: mixed up input-prc and input-virus-properties -
[
dbb0ca9
] chore: add bug fix to changelog -
[
e9df843
] Merge pull request #911 from nextstrain/fix-mangled-input-files
fix: mixed up input-prc and input-virus-properties
- [
d0d550a
] fix(cli): prevent truncation of components if basename contains dots
Resolves: #907
This rolls an in-house version of add_extension()
function which always adds an extension to a PathBuf
. This is different from PathBuf::with_extension()
which may replace or add extension depending on what the path is.
This solves a problem with basenames containing dots, as described in the issue: PathBuf::with_extension()
thought that they are extensions and replaced the last one. But we always want to add, not replace.
-
[
d5ef6bb
] docs: add new cli changes to changelog -
[
249adda
] docs: add recent web changes to changelog -
[
75d0ecd
] Merge pull request #913 from nextstrain/fix/cli-basename-dots -
[
95f20df
] Merge remote-tracking branch 'origin/master' into docs/web-changelog -
[
daa5c02
] docs: fix md syntax -
[
74e7334
] Merge branch 'docs/web-changelog' -
[
91e77c6
] chore: release cli 2.1.0
2.0.0
Nextclade 2.0.0
Rust
Nextclade core algorithms and command-line interface was reimplemented in Rust (replacing C++ implementation).
Rust is a modern, high performance programming language that is pleasant to read and write. Rust programs have comparable runtime performance with C++, while easier to write. It should provide a serious productivity boost for the dev team.
Also, it is now much simpler to contribute to Nextclade. If you wanted to contribute, or to simply review and understand the codebase, but were scared off by the complexity of C++, then give it another try - the Rust version is much more enjoyable! Check our developer guide for getting started. We are always open for contributions, reviews and ideas!
Alignment algorithm rewritten with adaptive bands
-
Feature: Previously, the alignment band width was constant throughout a given sequence. Now, band width is adaptive: narrow where seed matches indicate no indels, wide where seed matches indicate indels.
-
Performance is improved for sequences with indels
-
Fix: Terminal alignment errors, particularly common in BA.2, are fixed due to wider default band width between terminal seed matches and sequence ends
-
Fix: More robust seed matching allows some previously unalignable sequences to be aligned
-
Fix: Terminal indels for amino acid alignments are only free if the nucleotide alignment indicates a gap. Otherwise, they are penalized like internal gaps. This leads to more parsimonious alignment results.
-
Feature: Additional alignment parameters can now be tuned:
-
"Excess band width" parameter controls the extra band width that is necessary for correct alignment if both deletions and insertions occur between two seed matches.
-
"Terminal band width" controls the extra band width that is necessary for correct alignment if terminal indels occur.
-
-
Feature: "Min match rate" parameter is added, which sets required rage of seed matches in a sequence (number of matched seeds divided by total number of attempted seeds). If the measured rate is below required, alignment will not be attempted, as for such sequences, there is a high chance of infeasible memory and computational requirements. The default value is 0.3.
-
Fix: 3' terminal insertions are now properly detected
-
Feature: "Retry reverse complement" alignment parameter is added. When enabled, an additional attempt of seed matching is made after initial attempt fails. The second attempt is performed on reverse-complemented sequence.
As a consequence:
- the output alignment, peptides and analysis results correspond to this modified sequence and not to the original
- sequence name gets a suffix appended to it for all output files (fasta, seqName column, node name on the tree etc.)
- in output files, there is a new field/column:
isReverseComplement
, which containstrue
if the corresponding sequence underwent reverse-complement transformation
This functionality is opt-in and the default behavior is unchanged: skip sequence and emit a warning.
Genes on reverse (negative) strand
Nextclade now correctly handles genes on reverse (negative) strand, which is particularly important for Monkeypox virus.
Nextclade Web
-
Feature: Nextclade Web is now substantially faster, both to startup and when analysing sequences, due to general algorithmic improvements.
-
Feature: Drag&drop box for fasta files now supports multiple files. The files are concatenated in this case.
-
Feature: Sequence view and peptide views now show insertions. They are denoted as purple triangles.
-
Fix: Tree view now longer shows duplicate clade annotations
Input files
-
Fix: gene map GFF3 file now correctly accepts "gene" and "locus_tag" attributes. This should allow to use genome annotations from GeneBank with little or no modifications.
-
Feature: Nextclade now reads virus-specific alignment parameters from
virus_properties.json
file from the dataset. It is equivalent to passing alignment tweaks using command-line flags, but is more convenient. If a parameter is provided in bothvirus_properties.json
and as a flag, then the flag takes precedence.
Nextclade CLI
-
Feature: BREAKING CHANGE Command-line interface was redesigned to make it more consistent and ergonomic. The following invocation should be sufficient for most users:
nextclade run --input-dataset=dataset/ --output-all=out/ sequences.fasta
short version:
nextclade run -D dataset/ -O out/ sequences.fasta
-
Nextalign CLI and Nextclade CLI now require a command as the first argument. To reproduce the behavior of Nextclade v1, use
nextalign run
instead ofnextalign
andnextclade run
instead ofnextclade
. Seenextalign --help
ornextclade --help
for the full list of commands. Each command has it own--help
menu, e.g.nextclade run --help
. -
--input-fasta
flag is removed in favor of providing input sequence file names as positional arguments. Multiple input fasta files can be provided. Different compression formats are allowed:nextclade run -D dataset/ -O out/ 1.fasta 2.fasta.gz 3.fasta.xz 4.fasta.bz2 5.fasta.zst
-
If no fasta files provided, it will be read from standard input (stdin). Reading from stdin does not support compression.
-
If a special filename (
-
) is provided for one of the individual output file flags (--output-*
), the corresponded output will be printed to standard output (stdout). This allows integration into Unix-style pipelines. For example:curl $fasta_gz_url | gzip -cd | nextclade run -D dataset/ --output-tsv=- | my_nextclade_tsv_processor xzcat *.fasta.xz | nextalign run -r ref.fasta -m genemap.gff -o - | process_aligned_fasta
-
The flag
--output-all
(-O
) replaces--output-dir
flag and allows to conveniently output all files with a single flag. -
The new flag
--output-selection
allows to restrict what's being output by the--output-all
flag. -
If the
--output-basename
flag is not provided, the base name of output files will default to "nextclade" or "nextalign" respectively for Nextclade CLI and Nextalign CLI. They will no longer attempt to guess base file name from the input fasta. -
The new flag
--output-translations
is a dedicated flag to provide a file path template which will be used to output translated gene fasta files. This flag accepts a template string with a template variable{gene}
, which will be substituted with a gene name. Each gene therefore receives it's own path. Additionally, the translations are now independent from output directory and can be omitted if they are not necessary.
Example:
If the following is provided:
--output-translations='output_dir/gene_{gene}.translation.fasta'
then for SARS-CoV-2 Nextclade will write the following files:
output_dir/gene_ORF1a.translation.fasta output_dir/gene_ORF1b.translation.fasta ... output_dir/gene_S.translation.fasta
Make sure you properly quote and/or escape the curly braces in the variable
{gene}
, so that your shell, programming language or pipeline manager does not attempt to substitute the variable. -
-
Feature: New
--excess-bandwidth
,--terminal-bandwidth
,--min-match-rate
,--retry-reverse-complement
arguments are added (see "Alignment algorithm rewritten with adaptive bands" section for details) -
Feature: Nextclade CLI and Nextalign CLI now accept compressed input files. If a compressed fasta file is provided, it will be transparently decompressed. Supported compression formats:
gz
,bz2
,xz
,zstd
. Decompressor is chosen based on file extension. -
Feature: Nextclade CLI and Nextalign CLI can now write compressed output files. If output path contains one of the supported file extensions, it will be transparently compressed. Supported compression formats:
gz
,bz2
,xz
,zstd
. -
Feature: Nextclade can now write outputs in newline-delimited JSON format . Use
--output-ndjson
flag for that. NDJSON output is equivalent to JSON output, but is not hierarchical, so it can be easily streamed and parsed one entry at a time. -
Feature: Nextclade
dataset get
anddataset list
commands now can fetch dataset index from a custom server. The root URL of the dataset server can be set using--server=<URL>
flag. -
Feature: Nextclade
dataset get
command can output downloaded dataset in the form of a zip archive, using--output-zip
flag. The dataset zip is simply the dataset directory, but compressed, and it can be used as a replacement in the--input-dataset
flag of therun
command. -
Feature: Nextalign CLI and Nextclade CLI provide a command for generating shell completions: see
nextclade completions --help
for details. -
Feature: Verbosity of can be tuned using wither
--verbosity=<severity>
flag or one or multiple occurences of-v
and-q
flags. By default Nextclade and Nextalign show messages with severity "warn" or above (i.e. only warning and errors). Flag-v
increases and flag-q
decreases verbosity one step,-vv
and-qq
- two steps, etc.
Feedback
If you found a bug or have a suggestion, feel free to:
- submit a new issue on GitHub: nextstrain/nextclade
- fork the Nextclade GitHub repository nextstrain/nextclade and contribute a bugfix or an improvement (see dev guide)
- join Nextstrain discussion forum: [discussion.nextstrain...
2.0.0-beta.9
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
- [
bc9839c
] feat: retry with reverse complement when seed matching fails
Adds flag --retry-reverse-complement
which enables additional attempt of seed matching when initial attempt fails. The second attempt is performed on reverse-complemented sequence.
As a consequence, the output alignment, peptides and analysis results correspond to this modified sequence and not to the original.
This functionality is opt-in and the default behavior is to skip sequence with a warning.
-
[
879f780
] feat: append suffix to sequence if reverse complemented -
[
fa8f275
] feat(cli): issue a warning when a sequence was reverse-complemented -
[
fb271be
] Merge remote-tracking branch 'origin/master' into feat/reverse-if-seed-fails -
[
2605cca
] feat: add warning to errors.csv when sequence gets reverse-complemented -
[
18bdc94
] feat: add "isReverseComplement" columt to csv and tsv outputs -
[
7316d01
] feat(cli): default basename to a consistent hardcoded value
Currently, if --output-basename
is not provided, and the basename for files written to --output-all
is the same is for input fasta. However, if multiple fasta files provided, it switches to a hardcoded "nextaclade" or "nextalign".
This is not something that other CLI tools typically do and might be confusing, especially for use-cases where a certain filename is expected (i.e. in scripts and pipelines), especially when a number of input fasta files is not known in advance or if it changes between 2 runs.
This PR proposes to always use a hardcoded name for consistency, so that there is no surprise.
2.0.0-beta.8
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
-
[
ed65bf1
] feat: add sample data for hAdv-A -
[
6cf49c0
] Merge remote-tracking branch 'origin/master' into feat/hadv-a -
[
fe81e75
] Merge remote-tracking branch 'origin/master' into feat/hadv-a -
[
42bfe20
] Merge remote-tracking branch 'origin/master' into feat/hadv-a -
[
6435118
] chore: release web v2.2.0 -
[
a9c6c30
] feat: sort mutations, deletions and insertions
This adds sorting of mutations, deletions and insertions right after they are extracted. This should ensure that they are sorted in the the output files, which improves readability.
feat: sort mutations, deletions and insertions
This enables optimizations even in dev and test mode, to some of the third-party packages that are known to be slow. This should hopefully make dev experience a bit better.
-
[
3806b82
] Merge pull request #889 from nextstrain/chore/speedup-dev-and-test -
[
34c788d
] docs: cleanup changelog -
[
e356f13
] docs: add min match rate to changelog -
[
f45ac24
] fix(cli): typo -
[
7ec6b81
] feat(cli): make output compression faster
This:
-
reduces default output file compression levels for all formats to 2, which roughly corresponds to "fast" or "low" preset. This should ensure that outputs are not limited by compression speed in most cases.
-
allows to set compression levels per format with environment variables:
GZ_COMPRESSION
BZ2_COMPRESSION
XZ_COMPRESSION
ZST_COMPRESSION
-
[
12c36ca
] Merge pull request #892 from nextstrain/feat/faster-compression -
[
e0c7f17
] chore: release cli 2.0.0-beta.8
2.0.0-beta.7
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
- [
a2f174d
] feat(cli): allow to replace unknown nucs with Ns
This adds a flag --replace-unknown
to run
command, which allows to replace unknown nucleotide characters with 'N'.
By default, the sequences containing unknown nucleotide nucleotide characters are skipped with a warning - they
are not aligned and not included into results. If this flag is provided, then before the alignment,
all unknown characters are replaced with 'N'. This replacement allows to align and analyze these sequences.
The following characters are considered known:
-ABCDGHKMNRSTVWY
-
[
cae1f51
] Merge remote-tracking branch 'origin/master' into feat/cli-replace-unknown -
[
d8f5f28
] feat(cli): organize verbosity flags towards the end of help message
Currently --verbosity, --silent, -v and -q args are missorted in the --help message text.
Here I inline the clap-verbosity-flag
crate (only 1 file) and modify it, adding our custom flags and display_order
annotations, such that the args are shown in the very end of the message, just before the --help
arg.
-
[
d98312a
] Merge pull request #883 from nextstrain/feat/cli-organize-verbosity-flags -
[
fc4c089
] feat(cli): add headings for help sections
There are many arguments for run command, so let's organize them in named sections.
It required splitting args into separate structs and adding next_help_heading
annotations.
I could not figure out how to change the "OPTIONS" heading where the default --help arguments stays. So I added a fake indentation for all heading as if they are nested under "OPTIONS".
-
[
a1ff027
] Merge pull request #884 from nextstrain/feat/cli-add-help-headings -
[
1aa6b0f
] feat: add minimum seed matching rate -
[
d437b49
] Merge pull request #885 from nextstrain/fix/avoid-large-allocations
fix: avoid large allocations during alignment
2.0.0-beta.6
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
-
[
aa31635
] chore(ci): trigger ci -
[
e27dc48
] chore: add script for testing on different gnu linux distros [skip ci] -
[
3dfb00d
] fix(web): rectify incorrect filed name
This was sometimes causing a crash of the web app when filtering by aminoacids. The filed queryAa
was not spelled correctly and the lodash intersectionWith()
function's typings did not catch the type mismatch.
- [
444ceaa
] fix(cli): don't crash on unknown nucleotide characters
Nextclade v2 crashes when encounters unknown nucleotide letters which converting fasta string to the internal sequence representation.
This PR aligns behavior with v1: sequences with unknown nucleotide letters are now ignored, excluded from results, added to errors.csv and the run continues.
-
[
f4627bb
] Merge pull request #875 from nextstrain/fix/web-crash-on-filtering -
[
bad2f65
] Merge pull request #876 from nextstrain/fix/cli-crash-on-unknown-nuc -
[
9462d8d
] fix: add missing custom node attrs to tree json
This adds custom node attrs (e.g. pango lineages) to the tree json, that were previously missing.
This should ensure that they are shown on the tree viz as it was in Nextclade v1.
- [
6bc986d
] fix: add missing qc status to tree json
This adds qc status to the tree json, previously missing.
This should ensure that they QC status is shown on the tree viz as it was in Nextclade v1.
-
[
ae0bc80
] Merge pull request #878 from nextstrain/fix/tree-missing-custom-attrs -
[
0e48641
] Merge remote-tracking branch 'origin/master' into fix/tree-missing-qc-status -
[
1573e62
] Merge pull request #879 from nextstrain/fix/tree-missing-qc-status
fix: add missing qc status to tree json
- [
4b9fa41
] refactor: rename file to clarify intent
It will contain both compression and decompression functions
- [
78d0cd3
] feat(cli): add output file compression
Adds compression support for output files: if output filename contains one of the supported extensions, the outputs will be transparently compressed. Example --output-fasta=aligned.fasta.xz
. Supported formats as the same as for input decompression: gz, bz2, xz, zstd. Default compression levels are used.
To make it compile, I had to additionally:
-
change some of the lifetime parameters in CSV and NDJSON writer, because they were unnecessarily limiting
-
remove a very tricky
into_inner()
methods in CSV and NDJSON writer, which requiredSync
trait on inner writer, while zstd writer did not support that. For that, in a few places, instead of getting inner writer and getting a string out of it, I managed to just use vec as an inner writer in these places. Sointo_inner()
method was no longer needed, same asSync
trait bound. -
[
b8abb99
] Merge pull request #880 from nextstrain/feat/cli-output-compression -
[
f029502
] feat(cli): improve help messages for input fasta arg -
[
5d45010
] feat(cli): mention compression of outputs in cli help messages -
[
ef842bb
] Merge pull request #881 from nextstrain/feat/cli-improve-help -
[
7ed87ff
] docs: mention output file compression in changelog -
[
7af69da
] fix(cli): remove stdin from description of --output-translations arg
Translations cannot be written to stdout because there are many files
2.0.0-beta.5
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
-
[
f44cfc9
] chore(ci): add docker-dev script for consideration for checksum -
[
bfd4773
] chore(ci): tag pre-release images with version tag too -
[
696ea69
] chore(ci): reset ci caches -
[
5a45b5a
] chore(ci): fix debian 8 build by using clang 8
This is the last version available for debian 8 (jessie) on https://apt.llvm.org/
- [
1dc5992
] chore(ci): fix arm linux gnu build
By using a more recent version of Debian base image
- [
44cb9bb
] chore(ci): improve compatibility of linux gnu binaries further
Let's try to build on debian 7 (wheezy)
- [
5ba8e7e
] chore: release cli 2.0.0-beta.5
2.0.0-beta.4
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
2.0.0-beta.3
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
Commit history
(click to expand)
- [
0dcb8dd
] chore(ci): ensure better compatibility of linux gnu binaries
Let's build aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu binaries on debian 9, so that it links older shared libs. This should allows running on wider spectrum of Linux distros with various versions of glibc and libgcc.
- [
110f484
] chore(ci): add aarch64-unknown-linux-musl binaries
Let's add ARM linux musl binaries, and use musl gcc from the official musl website for both ARM and x86_64 builds for consistency
2.0.0-beta.2
This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases. |
---|
See the changelog: https://github.com/nextstrain/nextclade/blob/master/CHANGELOG.md#nextclade-200
Commit history
(click to expand)
-
[
1ec1c2b
] feat(web): add warning for unsupported browsers -
[
c91d006
] Merge pull request #866 from nextstrain/feat/web-unsipported-browser-warning -
[
0288991
] chore: release web v2.1.0 -
[
9d056b1
] chore(ci): ensure correct full domain var is set for web app builds -
[
c2d1459
] fix(web): ensure init errors are not hidden
Nextclade Web has been hiding some of the errors that occur during initialization. Notably, if dataset server is not reachable or dataset index fetch fails for any reason, then Nextclade would just show loading spinner indefinitely.
This PR ensures that the error is properly handled and that an error message is shown in these cases.