Skip to content
Evan Staton edited this page Jul 18, 2017 · 10 revisions

The most common usage of Pairfq would be the makepairs function for pairing reads that are out of sync due to trimming. See the specific makepairs page below for more information on how to pair reads. All the methods implemented in Pairfq are described below, each on its own page.

Table of Contents

USAGE

Type pairfq at the command line and you will see a menu describing the usage.

$ pairfq

ERROR: Command line not parsed correctly. Check input.

USAGE: pairfq [-h] [-m] [--version]

Required:
    addinfo           :      Add the pair info back to the FASTA/Q header.
    makepairs         :      Pair the forward and reverse reads and write singletons 
                             for both forward and reverse reads to separate files.
    joinpairs         :      Interleave the paired forward and reverse files.
    splitpairs        :      Split the interleaved file into separate files for the 
                             forward and reverse reads.

Options:
    --version         :       Print the program version and exit.
    -h|help           :       Print a usage statement.
    -m|man            :       Print the full documentation.

Specifying the method with no arguments will print the usage for that method. For example,

$ pairfq makepairs

ERROR: Command line not parsed correctly. Check input.

USAGE: pairfq makepairs [-f] [-r] [-fp] [-rp] [-fs] [-rs] [-im] [-h] [-m]

Required:
    -f|forward        :       File of foward reads (usually with "/1" or " 1" in the header).
    -r|reverse        :       File of reverse reads (usually with "/2" or " 2" in the header).
    -fp|forw_paired   :       Name for the file of paired forward reads.
    -rp|rev_paired    :       Name for the file of paired reverse reads.
    -fs|forw_unpaired :       Name for the file of singleton forward reads.
    -rs|rev_unpaired  :       Name for the file of singleton reverse reads.

Options:
    -idx|index        :       Construct an index for limiting memory usage.
                              NB: This may result in long run times for a large number of sequences. 
    -c|compress       :       Compress the output files. Options are 'gzip' or 'bzip2' (Default: No).
    -s|stats          :       Print statistics on the pairing results to STDOUT (Default: No).
    -h|help           :       Print a usage statement.
    -m|man            :       Print the full documentation.

Running the command pairfq -m will print the full documentation.

EXPECTED FORMATS

The input should be in FASTA or FASTQ format. It is fine if the input files are compressed (with either gzip or bzip2).

Currently, data from the Casava pipeline version 1.4 are supported. For example,

@HWUSI-EAS100R:6:73:941:1973#0/1

As well as Casava 1.8+ format,

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

The overall format of the sequence name and comment may vary, but there must be an integer (1 or 2) at the end of the sequence name or as the first character in the comment (following a space after the sequence name). If your data is missing this pair information it will be necessary to fix them first (with the addinfo method, see below).

METHODS

Pairfq has several different methods which can be executed. Below is a brief description of each.

  • makepairs

    • Pair the forward and reverse reads and write the singletons to separate files.
  • joinpairs

    • Interleave the paired reads for assembly or mapping.
  • splitpairs

    • Separate the interleaved FASTA/Q file into separate files for the forward and reverse reads.
  • addinfo

    • Add the pair information back to the data. After filtering or sampling Casava 1.8+ data, the pair information is often lost, making downstream analyses difficult. For example, @EAS139:136:FC706VJ 1:Y:18:ATCACG usually becomes @EAS139:136:FC706VJ. This command will add the pair information back (to become @EAS139:136:FC706VJ/1). There is no way to know what was in the comment, so it will not be restored.
Clone this wiki locally