Steps for the analyses that we will do:
0) Download metadata and MSA from GISAID
a) First [register for an account](https://platform.gisaid.org/epi3/cfrontend#335368). This may take several days.b) Once you have an account, sign in here with your username and password.
c) From the EpiCov tab, click on Downloads
and select the down arrow for the multiple sequence alignment (ex. MSA_0728), and the metadata (nextmeta)
d) Extract from tar.xz file with tar -xf file.tar.xz
-
Change names of GISAID fasta files to EPI IDs and format metadata from GISAID
-
Retrieve and format metadata from NCBI, combine files from NCBI and GISAID
-
Align GISAID and NCBI sequences using mafft
-
separate both alignments by locus.
-
make trees using raxml for
a) GISAID + NCBI (whole genome)
b) GISAID + NCBI (each gene)
-
Visualize trees and metadata with iTOL
-
Make distance matrices for a) GISAID genome alignment, b) GISAID individual gene alignments, c) GISAID+NCBI genome alignment, d) GISAID+NCBI individual alignments
-
Run m2clust on everything.