The file 03.unique_acgt.aln.xz contains a small database with 9185 aligned sequences from the
UK sequenced at the QIB.
This file can be used as the reference to test uvaia
against your aligned query sequences.
The sequences were aligned with uvaialign
against the reference genome Wuhan-Hu-1/2019 without trimming.
Their PANGO lineages as estimated by pangolin v4.0.6 can be found in file 03.unique_acgt.lineage_report.csv. As the name suggests these sequences are a random sample of the full database such that every sequence has a unique frequency of ACGT.
For a COGUK alignment, or if
you have access, to the unaligned sequences at GISAID (we suggest uvaialign
to align the sequences 😉).
The files 04.sample_1_1k.names and 04.sample_3_5k.names contain the sequences names of the samples used in the manuscript, and can all be found in the reference alignment above.
For the timing analyses, we used the COGUK data set ("Unmasked alignment" available from the archived version of 7th May 2023)