Mackenzie Sennett1, Kyle McGovern1, Ritwik Singhal2, Karine Moussa1,2
- Script Development
- Software Note
We developed a python notebook to return the genomic coordinates of COVID-19 variants, obtained from PANGO, which are originally formatted using gene coordinates. We also deliver a python notebook to return the genomic coordinates of all variants across all PANGO lineages.
The PANGO (Phylogenetic Assignment of Named Global Outbreaks) lineage repository retreives sequence information from GISAID to identify COVID-19 lineages currently in circulation around the globe.
PANGO determines COVID-19 lineages by monitoring groups of mutations involved in epidemoloigcal events, most commonly, the transmission of virus to a novel georgraphical area.
The current PANGO mutation format specfies the genetic position of an amino acid mutation in a COVID-19 variant (using amino acid coordinates, i.e. 1 coordinate = 3 base pairs).
However, much of COVID-19 research involves the use of the genomic position of the amino acid mutation.
Our manual-input conversion tool can be accessed here: COVID-19 Variant Conversion Utility
This notebook uses the SARS-CoV-2 genome (zip file) and a variant gff file (zip file) to convert the genetic position of a PANGO mutation to its genomic location.
This genomic location is also verified by translating the codon at the genomic position of the reference genome and verifying it against the reference amino acid in the input mutation.
- In the colab menu, go to: Runtime > Runall
- In the Input section, enter mutation(s) of interest in PANGO format (then hit
enter
)[GENE]:[REF_AMINO_ACID][AMINO_ACID_LOC][ALT_AMINO_ACID]
- Example:
S:D1118H
- Note: do not include mutation type, ie:
aa:
S:D1118H
The mut_start is the 0 based coordinate of the first nucleotide in the codon
We have developed an additional notebook which returns the genomic coordinates of all mutations across each COVID-19 lineage (B.1.1.7, B.1.351, P.1, A.23.1, B.1.525). This utility parses through each COVID-19 PANGO lineage web page to provide an up-to-date list of COVID-19 variants in circulation.
The PANGO genomic coordinates notebook can be accessed here: PANGO Lineages: All Genetic to Genomic conversions
- In the colab menu: Runtime > Runall
To download the output table, go to the Colab Files tab > right-click on snpaa.csv
> select Download: