-
Notifications
You must be signed in to change notification settings - Fork 2
Extra Multiple Sequence Alignment
Multiple sequence alignment (more here) leads to the identification of the homologous regions in a set of sequences, and to the "edit paths" from one sequence to another.
There are different options for MSA, and the ideal choice might depend on the nature of the dataset (sequence size, dataset size, nucleotide or protein...) and a good summary of the available options is in this review here
During the workshop we will use:
A common output format is "FASTA with gaps":
>seq1
--CAGTCGATCGGTAGCAGCTGACGTAGCAG--GAAGCT
>seq2
GGCAGTCGATC-GTAGCAGCTGACGTAGCAG--GAAGCT
>seq3
--CAGTCGATCGGTAGCAGCTGACGTAGCAG--CTAGC-
Another popular format is Phylip. The format was originally defined and used in Joe Felsenstein’s PHYLIP package. The first line specifies we have 5 sequences of 42 residues.
5 42
Turkey AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT
Salmo gairAAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT
H. SapiensACCGGTTGGC CGTTCAGGGT ACAGGTTGGC CGTTCAGGGT AA
Chimp AAACCCTTGC CGTTACGCTT AAACCGAGGC CGGGACACTC AT
Gorilla AAACCCTTGC CGGTACGCTT AAACCATTGC CGGTACGCTT AA
The newick format is the last example of a simple text file format of the workshop, and describes the topology and the attributes of a dendrogram.
It is based on parentheses to group closer nodes like: (A,B,(C,D)E)F;
/---------------+ A
|
=+-F-------------+ B
|
| /-------+ C
\-------+ E
\-------+ D
mkdir msa_test
mafft ~/phage-annotation-workshop/day_3/msa/polymerases.faa > msa_test/polymerases.msa
💡 try visualising the output file with less -S
to see the gaps!
IQtree can be used as described in the practical of Day 3:
iqtree -m MFP -B 1000 -alrt 1000 -s msa_test/polymerases.msa --prefix msa_test/iq
We can print trees even from the terminal (well, not ideal maybe)
nw_display poly.msa.treefile
/-------------------------------------------------------------------------------------------------------+ DENOBDJG 00531
|
/ DENOBDJG 02989
+ 0/47
=\ DENOBDJG 01754
|
| /+ DENOBDJG 00191
\---------+ 94/98
\ DENOBDJG 00144
|----------------------|-----------------------|----------------------|----------------------|-----------
0 0.02 0.04 0.06 0.08
substitutions/site
Newick utils can be used also to convert a Newick file to SVG:
nw_display -s poly.msa.treefile > poly.tree.svg
# In most Unix system is possible to have InkScape working from the command line and...
inkscape -f poly.tree.svg -A poly.pdf
Phage Annotation Workshop 2021