read duplication #217

necrolyte2 · 2016-03-24T08:27:09Z

Related #204

So today they filled their hard drive while running the pipeline.
The fastq files they are running are very large since they only ran 12 or 24 samples in the run.

An example project they have
RawFastq(1G x 2) + Filtered Fastq(1G x 2) + trimmed fastq(1G x 2) + bam(790M) = 6.7G

Then they are running a few of these samples(you can see how this is adding up)

At WRAIR this is less of an issue because we have de-duplication on the storage server

Just as a test I tried gzipping one of the fastq files that was originally 1.2G and it came out 330M, which is a pretty great storage savings.

Maybe we should force gzip output from all stages?

averagehat · 2016-03-30T14:25:58Z

I think we could use the unzipped data in the next step, then gzip it after, if that makes sense. like:

convert_format files 
ngs_filter files > filtered
gnuzip files
trim_reads filtered
gnuzip filtered

etc.

necrolyte2 · 2016-04-06T15:00:49Z

This will work fine since we can't just do gzip read/write due to Biopython incompatibility

averagehat · 2016-04-06T16:02:33Z

Another thing is that right now ngs_filter symlinks data from convert-formats if no filtering is done
could maybe fix this by skipping calling ngs_filter altogether within runsample. Are there any other symbolic link being used in the pipeline?

necrolyte2 · 2016-04-06T17:11:42Z

I don't remember any other symlinks besides runsamplesheet.sh symlinking consensus sequence files

necrolyte2 added the Needs Discussion label Mar 24, 2016

necrolyte2 added this to the AFRIMS Issues milestone Mar 24, 2016

averagehat mentioned this issue Apr 6, 2016

Rm dupes #235

Open

averagehat added the in progress label Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read duplication #217

read duplication #217

necrolyte2 commented Mar 24, 2016

averagehat commented Mar 30, 2016

necrolyte2 commented Apr 6, 2016

averagehat commented Apr 6, 2016

necrolyte2 commented Apr 6, 2016

read duplication #217

read duplication #217

Comments

necrolyte2 commented Mar 24, 2016

averagehat commented Mar 30, 2016

necrolyte2 commented Apr 6, 2016

averagehat commented Apr 6, 2016

necrolyte2 commented Apr 6, 2016