-
Notifications
You must be signed in to change notification settings - Fork 6
Working with FASTQ data from the SRA
Evan Staton edited this page Jul 18, 2017
·
2 revisions
Data from the SRQ typically has no pair information or if it does it may not conform to the FASTQ specification. You can fix this with one extra step by adding the pair information with Pairfq.
For example, if you have trimmed two paired-end FASTQ files ("s_1_trim.fastq.gz" and "s_2_trim.fastq.gz") you would like to now re-pair, the following commands should do the job.
pairfq addinfo -i s_1_trim.fastq.gz -o s_1_trim_id.fastq.gz -p 1 --compress gzip
pairfq addinfo -i s_2_trim.fastq.gz -o s_2_trim_id.fastq.gz -p 2 --compress gzip
Now, you can pair them as expected.
pairfq makepairs \
-f s_1_trim_id.fastq.gz \
-r s_2_trim_id.fastq.gz \
-fp s_1_trim_id_p.fastq.gz \
-rp s_2_trim_id_p.fastq.gz \
-fs s_1_trim_id_s.fastq.gz \
-rs s_2_trim_id_s.fastq.gz \
--compress gzip \
--stats
The paired files can now be joined, if necessary for an analysis.
pairfq joinpairs \
-f s_1_trim_id_p.fastq.gz \
-r s_1_trim_id_p.fastq.gz \
-o s_trim_id_p_interleaved.fastq.gz \
--compress gzip
The --compress gzip
option in the above commands may be omitted, though it does save disk space to keep files compressed. You may optionally use --compress bzip2
to use bzip2
compression.