-
Notifications
You must be signed in to change notification settings - Fork 0
AWS Step By Step Guide
Ryan Brott edited this page Jul 20, 2016
·
4 revisions
- Build the Tenaya jar
cd <dir>
git clone https://github.com/ScaleUnlimited/tenaya.git
cd tenaya
./gradlew clean shadowJar
- Launch an EC2 instance
- Log into the AWS console
- Click the EC2 service link
- Click the "Launch Instance" button
- Click the "Select" button to the right of the Amazon Linux AMI (e.g. Amazon Linux AMI 2016.03.3 (HVM), SSD Volume Type).
- Select an instance type with good network performance, SSD storage, and at least 15GB of RAM (e.g. c3.8xlarge)
- Click the "Next: Configure Instance Details" button.
- Click the "Next" Add Storage" button (all defaults on this page should be fine)
- Click the "Add New Volume" button, and select "Instance Store 0" from the pop-up menu.
- Click the "Add New Volume" button a second time, and select "Instance Store 1" from the pop-up menu.
- Click the "Review and Launch" button. This assumes you don't want to use a pre-defined security group.
- Click the "Launch" button.
- Select either "Choose an existing key pair" (and the appropriate key pair name) or "Create a new key pair" (and enter a name, then download it).
- Click the "Launch Instance" button.
- Copy the public DNS name from the instance details page.
- Upload required files to the instance (we assume your directory is still set to 'tenaya').
scp -i <path to key pair file> build/libs/tenaya.jar ec2-user@<public DNS>:
scp -i <path to key pair file> -r scripts ec2-user@<public DNS>:
- Configure the EC2 server
ssh -i <path to key pair file> ec2-user@<public DNS>
-
sudo fdisk -l
and find the path to the second drive sudo mkdir <mount point>
-
sudo mount -t ext3 <path from the previous step> <mount point>
where mount point is usually/media/ephemeral1/
export TENAYA_HOME=<path to tenaya storage dir>
-
sudo ./install.sh
(may requiresudo chmod u+x install.sh
) export PATH=$PATH:/home/ec2-user/sratoolkit.2.6.3-ubuntu64/bin
- Download SRA files
- Acquire a newline-delimited list of SRA runs by running
java -jar <path to tenaya.jar> search -o <organism scientific name with plus signs instead of spaces> > runs.txt
orjava -jar <path to tenaya.jar> fetch <list of experiment accessions separated by spaces> > runs.txt
or manually typing out the file. -
./download.sh <directory to dump FASTA data into>
(this tool accepts run accessions from standard in; type each accession followed by a newline and two newlines to end the input) - Note: because of the way
download.sh
works, you can pipe run accessions into it. For example,cat runs.txt | ./download.sh <directory to dump FASTA data into>
- Process fasta files
-
sudo ./process.sh <comma-separated file list> <number of processes> <number of threads>
(Note: optimized for a c3.8xlarge instance)
- Cluster the signatures
-
java -jar <path to tenaya.jar> cluster
for usage.