Skip to content

Working with FASTQ data from the SRA

Evan Staton edited this page Jul 18, 2017 · 2 revisions

Data from the SRQ typically has no pair information or if it does it may not conform to the FASTQ specification. You can fix this with one extra step by adding the pair information with Pairfq.

For example, if you have trimmed two paired-end FASTQ files ("s_1_trim.fastq.gz" and "s_2_trim.fastq.gz") you would like to now re-pair, the following commands should do the job.

pairfq addinfo -i s_1_trim.fastq.gz -o s_1_trim_id.fastq.gz -p 1 --compress gzip
pairfq addinfo -i s_2_trim.fastq.gz -o s_2_trim_id.fastq.gz -p 2 --compress gzip

Now, you can pair them as expected.

pairfq makepairs \
-f s_1_trim_id.fastq.gz \
-r s_2_trim_id.fastq.gz \
-fp s_1_trim_id_p.fastq.gz \
-rp s_2_trim_id_p.fastq.gz \
-fs s_1_trim_id_s.fastq.gz \
-rs s_2_trim_id_s.fastq.gz \
--compress gzip \
--stats

The paired files can now be joined, if necessary for an analysis.

pairfq joinpairs \
-f s_1_trim_id_p.fastq.gz \
-r s_1_trim_id_p.fastq.gz \
-o s_trim_id_p_interleaved.fastq.gz \
--compress gzip 

The --compress gzip option in the above commands may be omitted, though it does save disk space to keep files compressed. You may optionally use --compress bzip2 to use bzip2 compression.

Clone this wiki locally