Skip to content

PANGO Variant Conversion tools. Includes multi-input tool, a web-scraper tool, and a table of all VoC conversions.

Notifications You must be signed in to change notification settings

Karine-Moussa/PANGO-Genomic-Conversions

Repository files navigation

PANGO COVID-19 Mutation Conversion Tool: Genetic to Genomic Coordinates

Mackenzie Sennett1, Kyle McGovern1, Ritwik Singhal2, Karine Moussa1,2

  1. Script Development
  2. Software Note

Summary

We developed a python notebook to return the genomic coordinates of COVID-19 variants, obtained from PANGO, which are originally formatted using gene coordinates. We also deliver a python notebook to return the genomic coordinates of all variants across all PANGO lineages.

Background

The PANGO (Phylogenetic Assignment of Named Global Outbreaks) lineage repository retreives sequence information from GISAID to identify COVID-19 lineages currently in circulation around the globe.

PANGO determines COVID-19 lineages by monitoring groups of mutations involved in epidemoloigcal events, most commonly, the transmission of virus to a novel georgraphical area.

The current PANGO mutation format specfies the genetic position of an amino acid mutation in a COVID-19 variant (using amino acid coordinates, i.e. 1 coordinate = 3 base pairs).

image



However, much of COVID-19 research involves the use of the genomic position of the amino acid mutation.

image



Notebook

Our manual-input conversion tool can be accessed here: COVID-19 Variant Conversion Utility

This notebook uses the SARS-CoV-2 genome (zip file) and a variant gff file (zip file) to convert the genetic position of a PANGO mutation to its genomic location.

This genomic location is also verified by translating the codon at the genomic position of the reference genome and verifying it against the reference amino acid in the input mutation.

Instructions:

  • In the colab menu, go to: Runtime > Runall
  • In the Input section, enter mutation(s) of interest in PANGO format (then hit enter)
    • [GENE]:[REF_AMINO_ACID][AMINO_ACID_LOC][ALT_AMINO_ACID]
    • Example: S:D1118H
  • Note: do not include mutation type, ie: aa:S:D1118H

Example:

Runtime > Run all

image



Input PANGO mutations

image

Output

The mut_start is the 0 based coordinate of the first nucleotide in the codon image

Notebook: All Pango Lineage Conversions

We have developed an additional notebook which returns the genomic coordinates of all mutations across each COVID-19 lineage (B.1.1.7, B.1.351, P.1, A.23.1, B.1.525). This utility parses through each COVID-19 PANGO lineage web page to provide an up-to-date list of COVID-19 variants in circulation.

The PANGO genomic coordinates notebook can be accessed here: PANGO Lineages: All Genetic to Genomic conversions

Instructions:

  • In the colab menu: Runtime > Runall
Preview of output

image



To download the output table, go to the Colab Files tab > right-click on snpaa.csv > select Download:

image

About

PANGO Variant Conversion tools. Includes multi-input tool, a web-scraper tool, and a table of all VoC conversions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published