Skip to content

nextalign-0.2.0

Compare
Choose a tag to compare
@nextstrain-bot nextstrain-bot released this 23 Mar 05:05

Features:

This version changes how reference sequence and reference peptides are being written into the outputs files:

  • A new --include-reference boolean flag, false by default, now controls whether to include reference sequence into nucleotide alignment and peptide alignment output files.

  • If --include-reference flag is added, the output aligned nucleotide sequence fasta file now contains gap-stripped reference sequence as it's first entry. This is for symmetry with peptide files (*.gene.*.fasta).

  • (BREAKING CHANGE) Reference peptides are now only being written into peptide files if --include-reference flag is added.

  • (BREAKING CHANGE) If written, reference peptides in output peptide fasta files now contain reference sequence name from input fasta file, instead of string "Reference".

  • (BREAKING CHANGE) The example SARS-CoV-2 data was moved to data/sars-cov-2. Example reference sequence file is now called reference.fasta instead of .txt and contains sequence name (currently "MN908947").

  • Added data for other viruses

Bug fixes:

  • Make sure gene map parser adheres to GFF3 specification. Previously we assumed that keys and values in the attributes column were separated by spaces, but they should be separated with = instead. Backwards compatibility with space-separated gene maps is preserved.

  • Remove gaps from the output reference peptides. Query results are returned as alignment with insertions stripped, hence gap symbols in the reference should be stripped too.

  • Remove an overly strict check in gene extraction. We removed the incorrect check for the gene length being a multiple of 3 before removing gaps. This was too restrictive and was erroneously rejecting valid gene sequences.

  • Fixed integer overflow bug in a variable passed to Intel TBB library.