Skip to content
Laurent Jourdren edited this page Mar 26, 2015 · 2 revisions

Filtering reads.

WARNING: This documentation is outdated and will soon be updated.

Eoulsan defines a ReadFilter interface that allows to filter reads.

Using ReadFilter

In the following example, we use the IlluminaFilterFlagReadFilter that filters reads from Illumina sequencer on the filter flag in the id of the read.


ReadSequenceReader reader = new FastqReader(new File("in.fastq"));
ReadFilter filter = new IlluminaFilterFlagReadFilter(); 


for (ReadSequence read : reader) {

  if (filter.accept(read)) {
    System.out.println(read));
  }
}
reader.close();

How work ReadFilter

The Readfilter interface define two methods for read filtering:

  • boolean accept(ReadSequence read) for single end filtering
  • boolean accept(ReadSequence read1, ReadSequence read2) for paired-end/mate-pair filtering

Usually the accept(ReadSequence read1, ReadSequence read2) method is not implemented by developers as a call of the accept(ReadSequence read) for each ends of the sequenced cluster is often enough to filter reads. The abstract class AbstractReadSequence provide a ready to use accept method for this case.

@Override
public boolean accept(ReadSequence read1, ReadSequence read2) {

  return accept(read1) && accept(read2);
}

Combining multiple ReadFilter

Several filters can be executed with only one call to the accept() method using the MultiReadFilter class:

ReadSequenceReader reader = new FastqReader(new File("in.fastq");

List<ReadFilter> filtersList = new ArrayList<ReadFilter>();
filtersList.add(new IlluminaFilterFlagReadFilter()); 
filtersList.add(new ValidReadFilter()); 
ReadFilter filter = new MultiReadFilter(filtersList);

for (ReadSequence read : reader) {

  if (filter.accept(read)) {
    System.out.println(read));
  }
}
reader.close();


Available ReadFilter implementations

The following ReadFilter implementations are available:

  • PairCheckReadFilter checks if each pair of read to check comes from the same cluster,
  • PairEndReadFilter removes paired-end or single-end reads (useful ???),
  • QualityReadFilter filters reads with a bad mean quality,
  • TrimReadFilter filters polyN tail of reads and remove reads short reads,
  • ValidReadFilter filters reads that don't pass the validate() ReadSequence method,
  • IlluminaFilterFlagReadFilter filters reads that don't pass Illumina filter.

Writing a plug-in for the filterreads step of Eoulsan

It is very easy to write a plug-in for the filterreads step of Eoulsan. In this section we will write a MyQualityReadFilter class that filters on mean quality.

  • First add getName() and getDescription() method to your new filter:
package com.example;

public class MyQualityReadFilter extends AbstractReadFilter {

 @Override
  public String getName() {

    return "myquality";
  }

  @Override
  public String getDescription() {
    return "My quality threshold ReadFilter";
  }

}
  • Then add setParameter method that allow to configure our filter:
private double qualityThreshold = -1.0;

@Override
  public void setParameter(final String key, final String value)
      throws EoulsanException {

    if (key == null || value == null)
      return;

    if ("threshold".equals(key.trim())) {

      try {
        this.qualityThreshold = Double.parseDouble(value.trim());
      } catch (NumberFormatException e) {
        return;
      }

      if (this.qualityThreshold < 0.0)
        throw new EoulsanException("Invalid qualityThreshold: "
            + qualityThreshold);
    } else

      throw new EoulsanException("Unknown parameter for "
          + getName() + " read filter: " + key);

  }
  • And an init() method to initialize the plug-in once all the parameters has been set. Here for our example, if no threshold has been set throw an exception.
  @Override
  public void init() {

    if (this.qualityThreshold < 0.0)
      throw new IllegalArgumentException("Quality threshold is not set for "
          + getName() + " read filter.");
  }

  • Now we can add the accept() method:
@Override
  public boolean accept(final ReadSequence read) {

    if (read == null)
      return false;

    return mean(read.qualityScores()) > this.qualityThreshold;
  }
  • Now our ReadFilter can compile and can be used in a standalone program but not as a filterreads plug-in. To enable our Readfilter as a plug-in we must register it by adding the full name of the class in the fr.ens.transcriptome.eoulsan.bio.readsfilters.ReadFilter text file in the META-INF/services directory. See the Writing Step Plugin for more information:
com.example.MyQualityReadFilter