-
Notifications
You must be signed in to change notification settings - Fork 2
/
RAxML.Rmd
141 lines (108 loc) · 5.91 KB
/
RAxML.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Making a Tree Using RAxML"
author: "William Shoemaker and Jay T. Lennon"
date: "June 5, 2020"
header-includes:
- \usepackage{array}
output: pdf_document
geometry: margin=2.54cm
---
## OVERVIEW
Maximum likelihood (ML) is one of the preferred ways of constructing a phylogenetic tree.
ML uses the process of finding the parameter value that maximizes the likelihood of the data.
In addition, ML has the advantage of being the estimation method that is least affected by sampling error, is fairly robust to the assumptions of a particularl substitution model, allows you to compare different trees, and takes into account nucleotide states (as opposed to just distance).
However, ML estimation is a computationally intensive process that can require a lot of time.
This is because ML estimation needs to find the tree that gives the data the highest probability, a difficult task that requires the use of optimized algorithms.
Because ML estimation can take a long time, you may want to use Randomized Axelerated Maximum Likelihood (RAxML) package for generating maximum likelihood trees with bootstrap support (<http://sco.h-its.org/exelixis/web/software/raxml/index.html>).
Instructions below are for running RAxML on the high speed computer, Carbonate at IU.
If you are an IU student, postdoc, or faculty member you can register for an account on Carbonate.
See the IU Mason webpage for more information (<https://kb.iu.edu/d/aolp>).
### Step 1. Create an alligned fasta file
There are many steps to making a good tree.
The first step is to create a good alignment.
There are different ways of doing this.
For example, Silva has a fast and easy web-based alignment tool (<https://www.arb-silva.de/aligner/>), but this may not be ideal for projects with lots of sequences.
In addition, this may not be very reproducible.
For larger jobs, sequences can be aligned in a reproducible manner using `mothur` on the cluster.
Another option is to use the program `muscle`, which can be downloaded here: <https://drive5.com/muscle/>.
The stand-alone program should be installed on the admin account of your computer.
After unzipping the downloaded file, you might consider renaming to "muscle".
Then, move the program from, for example, your desktop using the following command at the terminal:
`sudo cp Dekstop/muscle /usr/local/bin`
Confirm proper installation by typing `muscle` at the command line.
Now you are ready to align your sequences using the following command at the terminal:
`muscle -in ./Rpf/data/phylogeny/data/Rpf.16S.tree.fasta -out ./Rpf/data/phylogeny/Rpf.16S.tree.afa`
### Step 2. Create a shell script that will run RAxML
With the muscle-aligned sequences, you are now ready to generate the maximum likelihood tree files.
The first step in this process is to generate a shell script, which we will name `rpf.ml.sh`.
The # sign in the code below are comments that provide some annotation.
In addition, adjust `walltime` for appropriate length of time needed.
You should also change and include the full directory to where your shell script and fasta file are located.
You should change the `-o` option to the name of your appropriate outgroup sequence (if necessary).
More description on RAxML code can be found here: <http://sco.h-its.org/exelixis/web/software/raxml/hands_on.html>
```
#!/bin/bash
#PBS -k o
#PBS -l nodes=2:ppn=8,vmem=100gb,walltime=24:00:00
#PBS -M [email protected]
#PBS -m abe
#PBS -j oe
#PBS -N RpfTree
module load raxml/gnu/8.2.11
# cd into the directory with your alignment
cd /N/dc2/projects/Lennon_Sequences/Rpf
raxmlHPC-PTHREADS -T 4 -f a -m GTRGAMMA -p 12345 -x 12345 -o Methanosarcina -# autoMRE -s ./Rpf.16S.tree.afa -n Rpf.16S.tree.afa
# -T = number of threads
# -f = specifies bootstrapping algorithm with ML generating tree at same time
# -m = substitution model, generalized time reversible gamma
# -p = starts tree randomly
# -x = starts tree randomly
# -o = outgroup (name after fasta entry)
# -# = determines number of bootstrap replicates
# -s = aligned fasta file to be analyzed
# -n = name of output file
```
### Step 3. Move files to Carbonate and execute shell script
To run RAxML, you will need to move files to Carbonate.
This can be done as follows.
First, open a terminal window and navigate to the directory on your local computer containing your shell script file and your aligned fasta file.
Next you will secure copy the shell script from your local computer to Carbonate using commands like this (you will be asked for your password):
```
scp rpf.ml.sh [email protected]:/N/dc2/projects/Lennon_Sequences/Rpf
```
You will repeat this step to move your alignment file to the same directory:
```
scp Rpf.16S.tree.afa [email protected]:/N/dc2/projects/Lennon_Sequences/Rpf
```
Now you need to open a separate terminal window and log into Carbonate as follows (you will be asked for your password)
```
```
Now navigate to your folder containing recently `scp`'d files like this:
```
cd /N/dc2/projects/Lennon_Sequences/Rpf
```
Now you are ready to submit your job by typing the following at the terminal:
```
qsub rpf.ml.sh
```
You can check on the status of your job typing the following at the terminal (substitute your username):
```
qstat -ulennon
```
If you need to kill the job, type the following at the terminal after retrieving your job number from qstat
```
qdel 332659
```
If your job successfully ran, you will see a number of output files in your project directory on Carbonate.
The list will include files that look like this:
RAxML_bestTree.rpf.ml
RAxML_bipartitionsBranchLabels.rpf.ml
RAxML_bipartitions.rpf.ml
RAxML_bootstrap.rpf.ml
RAxML_info.rpf.ml
To move these files back to your local computer, do the following at the terminal:
```
scp [email protected]:/N/dc2/projects/Lennon_Sequences/Rpf/RAxML_bestTree.rpf.ml /Users/lennonj/Github/Rpf/data/phylogeny
```
For making ML tree, use the RAxML_bipartitions.rpf.ml file.