-
Notifications
You must be signed in to change notification settings - Fork 10
WritingSAM Filter
Eoulsan defines a ReadAlignmentsFilter
interface that allows to filter SAM alignments.
In the following example, we use the RemoveUnmappedReadAlignmentsFilter
that removes SAM alignments not aligned against the genome.
SAMFileReader reader = new SAMFileReader(new File("input.sam"));
// creation of the filter and set its parameters
ReadAlignmentsFilter filter = new RemoveUnmappedReadAlignmentsFilter();
filter.setParameter("remove", "true");
// creation of the ReadAlignmentsFilterBuffer object
ReadAlignmentsFilterBuffer rafb = new ReadAlignmentsFilterBuffer(filter);
List<SAMRecord> records = new ArrayList<SAMRecord>();
// SAM file reading
for (SAMRecord samRecord : reader) {
// the current alignment is not on the same read as the previous alignment
if (!rafb.addAlignment(samRecord)) {
// all the alignments that are on the previous read are filtered...
records.clear();
records.addAll(rafb.getFilteredAlignments()); // buffer emptied
// ... and written
for (SAMRecord r : records) {
System.out.println(r.getSAMString());
}
// now the current alignment is added to the buffer
rafb.addAlignment(samRecord);
}
}
// treatment of the last alignment of the SAM file
records.clear();
records.addAll(rafb.getFilteredAlignments());
for (SAMRecord r : records) {
System.out.println(r.getSAMString());
}
reader.close();
For users, the treatment of single-end and paired-end reads is the same one, thanks to the addAlignment()
function in the ReadAlignmentsFilterBuffer
class.
This object is a buffer of alignments from the same read.
There are two major methods in this class:
-
addAlignment()
takes a SAMRecord (alignment) in argument and returns a boolean. It is true if the SAMRecord has the same read name as other alignments already stored or if the buffer is empty, and false in other cases. -
getFilteredAlignments()
takes no argument and returns the list of SAMRecord in the buffer that pass the filter.
Several filters can be executed with nearly the same code as above, using the MultiReadAlignmentsFilter
class:
SAMFileReader reader = new SAMFileReader(new File("input.sam"));
// creation of a filter list
List<ReadAlignmentsFilter> filterList = new ArrayList<ReadAlignmentsFilter>();
// filters are added to the list
filterList.add(new RemoveUnmappedReadAlignmentsFilter());
filterList.add(new RemoveMultiMatchesReadAlignmentsFilter());
// set filter parameters
filterList.get(0).setParameter("remove", "true");
filterList.get(1).setParameter("remove", "true");
// creation of the MultiReadAlignmentsFilter object
ReadAlignmentsFilter filter = new MultiReadAlignmentsFilter(filterList);
// creation of the ReadAlignmentsFilterBuffer object
ReadAlignmentsFilterBuffer rafb = new ReadAlignmentsFilterBuffer(filter);
List<SAMRecord> records = new ArrayList<SAMRecord>();
// SAM file reading
for (SAMRecord samRecord : reader) {
// the current alignment is not on the same read as the previous alignment
if (!rafb.addAlignment(samRecord)) {
// all the alignments that are on the previous read are filtered...
records.clear();
records.addAll(rafb.getFilteredAlignments());
// ... and written
for (SAMRecord r : records) {
System.out.println(r.getSAMString());
}
// now the current alignment is added to the buffer
rafb.addAlignment(samRecord);
}
}
// treatment of the last record
records.clear();
records.addAll(rafb.getFilteredAlignments());
for (SAMRecord r : records) {
System.out.println(r.getSAMString());
}
reader.close();
The following ReadAlignmentsFilter
implementations are available:
-
RemoveUnmappedReadAlignmentsFilter
removes all unmapped alignments, -
RemoveMultiMatchesReadAlignmentsFilter
removes all multi-match alignments, -
QualityReadAlignmentsFilter
removes alignments with a bad mapping quality score, -
KeepOneMatchReadAlignmentsFilter
keeps, in case of a multi-match alignments, the first alignment of the read, -
KeepNumberMatchReadAlignmentsFilter
keeps, in case of a multi-match alignments, a given number of alignments of the read, -
DistanceFromReferenceReadAlignmentsFilter
removes alignments which have a distance too long from the reference sequence.
It is very easy to write a plug-in for the filtersam step of Eoulsan. In this section we will write a MyQualityReadFilter
class that filters on the mapping quality score.
- First add
getName()
andgetDescription()
methods to your new filter:
package com.example;
public class MyQualityReadAlignmentsFilter extends AbstractReadAlignmentsFilter {
@Override
public String getName() {
return "mymappingquality";
}
@Override
public String getDescription() {
return "My mapping quality threshold ReadAlignmentFilter";
}
}
- Then add
setParameter
method that allow to configure our filter:
private double qualityThreshold = -1.0;
@Override
public void setParameter(final String key, final String value)
throws EoulsanException {
if (key == null || value == null)
return;
if ("threshold".equals(key.trim())) {
try {
this.qualityThreshold = Double.parseDouble(value.trim());
} catch (NumberFormatException e) {
return;
}
if (this.qualityThreshold < 0.0)
throw new EoulsanException("Invalid qualityThreshold: "
+ qualityThreshold);
} else
throw new EoulsanException("Unknown parameter for "
+ getName() + " alignment filter: " + key);
}
- And an
init()
method to initialize the plug-in once all the parameters has been set. Here for our example, if no threshold has been set throw an exception.
@Override
public void init() {
if (this.qualityThreshold < 0.0)
throw new IllegalArgumentException("Mapping quality threshold is not set for "
+ getName() + " alignment filter.");
}
- Now we can add the
filterReadAlignments()
method:
@Override
public boolean filterReadAlignments(final List<SAMRecord> records) {
if (records == null)
return;
// single-end mode
if (!records.get(0).getReadPairedFlag()) {
for (SAMRecord r : records) {
// storage in 'result' of records that do not pass the quality filter
if (r.getMappingQuality() < this.qualityThreshold) {
this.result.add(r);
}
}
}
// paired-end mode
else {
for (int counterRecord = 0; counterRecord < records.size() - 1; counterRecord +=
2) {
// storage in 'result' of records that do not pass the quality filter
if (records.get(counterRecord).getMappingQuality() < this.qualityThreshold
|| records.get(counterRecord + 1).getMappingQuality() <
this.qualityThreshold) {
// records are stored 2 by 2 because of the paired-end mode
this.result.add(records.get(counterRecord));
this.result.add(records.get(counterRecord + 1));
}
}
}
// all records that do not pass the mapping quality filter are removed
records.removeAll(result);
result.clear();
}
- Now our
ReadAlignmentsFilter
can compile and can be used in a standalone program but not as a filtersam plug-in. To enable ourReadAlignmentsFilter
as a plug-in we must register it by adding the full name of the class in thefr.ens.transcriptome.eoulsan.bio.alignmentsfilters.ReadAlignmentsFilter
text file in the META-INF/services directory. See the Writing Step Plugin for more information.