Release nextalign-0.2.0 · nextstrain/nextclade

Features:

This version changes how reference sequence and reference peptides are being written into the outputs files:

A new --include-reference boolean flag, false by default, now controls whether to include reference sequence into nucleotide alignment and peptide alignment output files.
If --include-reference flag is added, the output aligned nucleotide sequence fasta file now contains gap-stripped reference sequence as it's first entry. This is for symmetry with peptide files (*.gene.*.fasta).
(BREAKING CHANGE) Reference peptides are now only being written into peptide files if --include-reference flag is added.
(BREAKING CHANGE) If written, reference peptides in output peptide fasta files now contain reference sequence name from input fasta file, instead of string "Reference".
(BREAKING CHANGE) The example SARS-CoV-2 data was moved to data/sars-cov-2. Example reference sequence file is now called reference.fasta instead of .txt and contains sequence name (currently "MN908947").
Added data for other viruses

Bug fixes:

Make sure gene map parser adheres to GFF3 specification. Previously we assumed that keys and values in the attributes column were separated by spaces, but they should be separated with = instead. Backwards compatibility with space-separated gene maps is preserved.
Remove gaps from the output reference peptides. Query results are returned as alignment with insertions stripped, hence gap symbols in the reference should be stripped too.
Remove an overly strict check in gene extraction. We removed the incorrect check for the gene length being a multiple of 3 before removing gaps. This was too restrictive and was erroneously rejecting valid gene sequences.
Fixed integer overflow bug in a variable passed to Intel TBB library.

Provide feedback