Skip to content

AWS Step By Step Guide

Ryan Brott edited this page Jul 20, 2016 · 4 revisions
  1. Build the Tenaya jar
  • cd <dir>
  • git clone https://github.com/ScaleUnlimited/tenaya.git
  • cd tenaya
  • ./gradlew clean shadowJar
  1. Launch an EC2 instance
  • Log into the AWS console
  • Click the EC2 service link
  • Click the "Launch Instance" button
  • Click the "Select" button to the right of the Amazon Linux AMI (e.g. Amazon Linux AMI 2016.03.3 (HVM), SSD Volume Type).
  • Select an instance type with good network performance, SSD storage, and at least 15GB of RAM (e.g. c3.8xlarge)
  • Click the "Next: Configure Instance Details" button.
  • Click the "Next" Add Storage" button (all defaults on this page should be fine)
  • Click the "Add New Volume" button, and select "Instance Store 0" from the pop-up menu.
  • Click the "Add New Volume" button a second time, and select "Instance Store 1" from the pop-up menu.
  • Click the "Review and Launch" button. This assumes you don't want to use a pre-defined security group.
  • Click the "Launch" button.
  • Select either "Choose an existing key pair" (and the appropriate key pair name) or "Create a new key pair" (and enter a name, then download it).
  • Click the "Launch Instance" button.
  • Copy the public DNS name from the instance details page.
  1. Upload required files to the instance (we assume your directory is still set to 'tenaya').
  • scp -i <path to key pair file> build/libs/tenaya.jar ec2-user@<public DNS>:
  • scp -i <path to key pair file> -r scripts ec2-user@<public DNS>:
  1. Configure the EC2 server
  • ssh -i <path to key pair file> ec2-user@<public DNS>
  • sudo fdisk -l and find the path to the second drive
  • sudo mkdir <mount point>
  • sudo mount -t ext3 <path from the previous step> <mount point> where mount point is usually /media/ephemeral1/
  • export TENAYA_HOME=<path to tenaya storage dir>
  • sudo ./install.sh (may require sudo chmod u+x install.sh)
  • export PATH=$PATH:/home/ec2-user/sratoolkit.2.6.3-ubuntu64/bin
  1. Download SRA files
  • Acquire a newline-delimited list of SRA runs by running java -jar <path to tenaya.jar> search -o <organism scientific name with plus signs instead of spaces> > runs.txt or java -jar <path to tenaya.jar> fetch <list of experiment accessions separated by spaces> > runs.txt or manually typing out the file.
  • ./download.sh <directory to dump FASTA data into> (this tool accepts run accessions from standard in; type each accession followed by a newline and two newlines to end the input)
  • Note: because of the way download.sh works, you can pipe run accessions into it. For example, cat runs.txt | ./download.sh <directory to dump FASTA data into>
  1. Process fasta files
  • sudo ./process.sh <comma-separated file list> <number of processes> <number of threads> (Note: optimized for a c3.8xlarge instance)
  1. Cluster the signatures
  • java -jar <path to tenaya.jar> cluster for usage.
Clone this wiki locally