-
Notifications
You must be signed in to change notification settings - Fork 0
AWS Records
Ryan Brott edited this page Jul 7, 2016
·
16 revisions
Group | Downloaded | Extracted | Processed |
---|---|---|---|
1-20 | yes | yes | yes |
21-40 | yes | yes | yes |
41-60 | yes | yes | yes |
- Group = record numbers
- Downloaded = downloaded
.sra
files - Extracted = extracted
.fasta
files from the.sra
files - Processed = created signatures from the
.fasta
files
-
install.sh
: designed to download the SRA toolkit and aspc. (note: you may need to manually add the SRA toolkit to the path) -
download.sh
: designed for parallel downloading from the SRA. Usage: reads identifiers in line-by-line from standard in and writes the downloaded.fasta
files to the directory provided as the first argument. (note: set the environmentTENAYA_HOME
to set which directory should contain cached.sra
files from previous downloads) -
process.sh
: designed for parallel processing of.fasta
format data. Usage: process.sh where files is a comma-separated list of.fasta
file names, groups is the number of parallel processes to run, and threads is the number of threads to use. (note: threads and files should both be divisible by groups to allow for even segmentation;tenaya.jar
must also be present in the current working directory)
-M 10000000000 -k 20 -c 1 -m partition -b 1048576 -q 10000 -t <threads>
aws configure
-
aws s3 cp
necessary files -
scp
scripts and JARs - Run
scripts/install.sh
- Download using
scripts/download.sh
- Process using
java -Xmx20g -jar <tenaya.jar location> generate [args]
Get file list: cat records.txt | head -n 10 | sed 's/^/\/media\/ephemeral0\/tenaya\/data\//g' | sed 's/$/.fasta/g' | sed ':a;N;$!ba;s/\n/,/g'