AWS Records

Records

Group	Downloaded	Extracted	Processed
1-20	yes	yes	yes
21-40	yes	yes	yes
41-60	yes	yes	yes

install.sh: designed to download the SRA toolkit and aspc. (note: you may need to manually add the SRA toolkit to the path)
download.sh: designed for parallel downloading from the SRA. Usage: reads identifiers in line-by-line from standard in and writes the downloaded .fasta files to the directory provided as the first argument. (note: set the environment TENAYA_HOME to set which directory should contain cached .sra files from previous downloads)
process.sh: designed for parallel processing of .fasta format data. Usage: process.sh where files is a comma-separated list of .fasta file names, groups is the number of parallel processes to run, and threads is the number of threads to use. (note: threads and files should both be divisible by groups to allow for even segmentation; tenaya.jar must also be present in the current working directory)

-M 10000000000 -k 20 -c 1 -m partition -b 1048576 -q 10000 -t <threads>

Get file list: cat records.txt | head -n 10 | sed 's/^/\/media\/ephemeral0\/tenaya\/data\//g' | sed 's/$/.fasta/g' | sed ':a;N;$!ba;s/\n/,/g'