Skip to content

SRST2 v0.1.4 - Short Read Sequence Typing for Bacterial Pathogens

Compare
Choose a tag to compare
@katholt katholt released this 26 Jun 09:55
· 182 commits to master since this release

Note the pre-print of the paper will shortly be available in BioRxiv.

  1. No longer store sam and unsorted bam (can be retained via the --keep_interim_alignment flag)
  2. Added options to specify a maximum number of mismatches to allow during mapping; this is specified separately for mlst and genes, so that it is possible to relax the stringency of gene detection in the same run as a high-accuracy MLST test.
    Default value for both is 10 mismatches.
    --mlst_max_mismatch
    --gene_max_mismatch
  3. The highest minor allele frequency (MAF) of variants encountered in the alignment is now calculated and reported for each allele (in the scores file) and also at the gene level and ST level, to facilitate checking for mixed/contaminated read sets.

This value is in the range 0 -> 0.5; with e.g. 0 indicating no variation between reads at any aligned base (i.e. at all positions in the alignment, all aligned reads agree on the same base call; although this agreed base may be different from the reference); and 0.25 indicating there is at least one position in the alignment at which all reads do not agree, and the least common variant (either match or mismatch to the reference) is present in 25% of reads. This value is printed, for all alleles, to the scores file. Note this is different to the ‘LeastConfident’ information printed to scores, which presents the strongest evidence for mismatch compared to the reference, i.e. between 0 -> 1.

The highest such value for each gene/cluster/locus is reported in the fullgenes output table.

The highest such value across all MLST loci is reported in the mlst output table.

Note that all compiled reports will now include a maxMAF column; if you provide MLST or compiled reports from previous versions without this columns, the value “NC” will be inserted in the maxMAF column to indicate “not calculated”. This ensures the updated SRST2 (v0.1.4+) is backwards compatible with previous SRST2 outputs; do be aware though that the older versions of SRST2 (<v0.1.4) will not be forwards-compatible with output generated by more recent versions (v0.14 onwards).

  1. Added R code for plotting SRST2 output in R (plotSRST2data.R).
    Instructions will be added to the read me.
  2. Added srst2-formatted ARG-Annot resistance gene database and plasmid replicon databases to /data.