-
Notifications
You must be signed in to change notification settings - Fork 6
Home
The most common usage of Pairfq would be the makepairs
function for pairing reads that are out of sync due to trimming. See the specific makepairs
page below for more information on how to pair reads. All the methods implemented in Pairfq are described below, each on its own page.
- Adding pair information to reads
pairfq addinfo
- Pairing reads with
pairfq makepairs
- Interleaving paired reads with
pairfq joinpairs
- Splitting interleaved reads into separate files
pairfq splitpairs
- Installing dependencies
- Working with data from the NCBI SRA
USAGE
Type pairfq
at the command line and you will see a menu describing the usage.
$ pairfq
ERROR: Command line not parsed correctly. Check input.
USAGE: pairfq [-h] [-m] [--version]
Required:
addinfo : Add the pair info back to the FASTA/Q header.
makepairs : Pair the forward and reverse reads and write singletons
for both forward and reverse reads to separate files.
joinpairs : Interleave the paired forward and reverse files.
splitpairs : Split the interleaved file into separate files for the
forward and reverse reads.
Options:
--version : Print the program version and exit.
-h|help : Print a usage statement.
-m|man : Print the full documentation.
Specifying the method with no arguments will print the usage for that method. For example,
$ pairfq makepairs
ERROR: Command line not parsed correctly. Check input.
USAGE: pairfq makepairs [-f] [-r] [-fp] [-rp] [-fs] [-rs] [-im] [-h] [-m]
Required:
-f|forward : File of foward reads (usually with "/1" or " 1" in the header).
-r|reverse : File of reverse reads (usually with "/2" or " 2" in the header).
-fp|forw_paired : Name for the file of paired forward reads.
-rp|rev_paired : Name for the file of paired reverse reads.
-fs|forw_unpaired : Name for the file of singleton forward reads.
-rs|rev_unpaired : Name for the file of singleton reverse reads.
Options:
-idx|index : Construct an index for limiting memory usage.
NB: This may result in long run times for a large number of sequences.
-c|compress : Compress the output files. Options are 'gzip' or 'bzip2' (Default: No).
-s|stats : Print statistics on the pairing results to STDOUT (Default: No).
-h|help : Print a usage statement.
-m|man : Print the full documentation.
Running the command pairfq -m
will print the full documentation.
EXPECTED FORMATS
The input should be in FASTA or FASTQ format. It is fine if the input files are compressed (with either gzip or bzip2).
Currently, data from the Casava pipeline version 1.4 are supported. For example,
@HWUSI-EAS100R:6:73:941:1973#0/1
As well as Casava 1.8+ format,
@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
The overall format of the sequence name and comment may vary, but there must be an integer (1 or 2) at the end of the sequence name or as the first character in the comment (following a space after the sequence name). If your data is missing this pair information it will be necessary to fix them first (with the addinfo
method, see below).
METHODS
Pairfq has several different methods which can be executed. Below is a brief description of each.
-
makepairs
- Pair the forward and reverse reads and write the singletons to separate files.
-
joinpairs
- Interleave the paired reads for assembly or mapping.
-
splitpairs
- Separate the interleaved FASTA/Q file into separate files for the forward and reverse reads.
-
addinfo
- Add the pair information back to the data. After filtering or sampling Casava 1.8+ data, the pair information is often lost, making downstream analyses difficult. For example,
@EAS139:136:FC706VJ 1:Y:18:ATCACG
usually becomes@EAS139:136:FC706VJ
. This command will add the pair information back (to become@EAS139:136:FC706VJ/1
). There is no way to know what was in the comment, so it will not be restored.
- Add the pair information back to the data. After filtering or sampling Casava 1.8+ data, the pair information is often lost, making downstream analyses difficult. For example,