Skip to content

Writing Step Plugin

Laurent Jourdren edited this page Mar 13, 2015 · 3 revisions

Writing a step plug-in

Introduction

This page show how writing a step plug-in for Eoulsan. The sample code here is a step for mapping reads with the Gsnap mapper in local mode. The executable of Gsnap is already bundled in Eoulsan (in src/main/java/files/linux/amd64 source folder), so we don't talk here about gsnap compilation.

How writing a plugin

  • In the package com.example create a class named GsnapExampleStep that extends AbstractStep. All the code of the step is in this source file. You can download it here.
package com.example;

import java.util.logging.Logger;
import fr.ens.transcriptome.eoulsan.Globals;
import fr.ens.transcriptome.eoulsan.annotations.LocalOnly;
import fr.ens.transcriptome.eoulsan.core.Context;
import fr.ens.transcriptome.eoulsan.design.Design;
import fr.ens.transcriptome.eoulsan.steps.AbstractStep;
import fr.ens.transcriptome.eoulsan.steps.StepResult;

// The "@LocalOnly" annotation means that the Eoulsan workflow engine will 
// only use this step in local mode. The two other annotations are "@HadoopOnly" 
// and "@HadoopCompatible" when a step can be executed in local or Hadoop mode.
@LocalOnly
public class GsnapExampleStep extends AbstractStep {

  /** Logger */
  private static final Logger LOGGER = Logger.getLogger(Globals.APP_NAME);

  @Override
  public String getName() {
    // This method return the name of the step
    // We don't use gsnap as step name as it already exists in Eoulsan
    return "gsnapexample";
  }
  
  @Override
  public String getDescription() {
    // This method return a description of the step. This method is optional
    return "This step map reads using gsnap";
  }

  @Override
  public StepResult execute(final Design design, final Context context) {
    // TODO Auto-generated method stub
    // We will write the code of this method later
    return null;
  }

}
  • Now we add the input and output formats of this step with the following methods. The input and output formats of the step allow to the Eoulsan workflow engine to test if all the necessary files for the analysis exists before the launch of an analysis.
  @Override
  public DataFormat[getInputFormats() {
    return new DataFormat[](]) {DataFormats.FILTERED_READS_FASTQ,
        DataFormats.GMAP_INDEX_ZIP, DataFormats.GENOME_DESC_TXT};
  }
  
  @Override
  public DataFormat[getOutputFormats() {
    return new DataFormat[](]) {DataFormats.MAPPER_RESULTS_SAM};
  }
  • This step can be configured with the configure() method. For our example, we define a mapperarguments parameter for setting additional parameters for gsnap.
  private String mapperArguments = "-N 1";

  @Override
  public void configure(final Set<Parameter> stepParameters)
      throws EoulsanException {

    for (Parameter p : stepParameters) {

      if ("mapperarguments".equals(p.getName()))
        this.mapperArguments = p.getStringValue();
      else
        throw new EoulsanException("Unknown parameter for "
            + getName() + " step: " + p.getName());

    }
  }
  • Now we add the execute() method that is called for data processing:
  @Override
  public StepResult execute(final Design design, final Context context) {

    // The design object contain the list of all the sample to process. This
    // object contains all the information of the design file

    // The context object had many useful method for writing a Step
    // (e.g. access to file to process, the workflow description, the logger...)

    try {
      // Save the start time
      final long startTime = System.currentTimeMillis();

      // Log message to write at the end of the step
      final StringBuilder log = new StringBuilder();

      // For each sample of the analysis
      for (final Sample sample : design.getSamples()) {

        // Create the reporter. The reporter collect information about the
        // process of the data (e.g. the number of reads, the number of
        // alignments
        // generated...)
        final Reporter reporter = new Reporter();

        // Get the path to the archive that contains the GMAP genome index
        // In Eoulsan, to get the path of a file, you just have to call the
        // context.getDataFile() with the data type and sample object as
        // argument
        final File archiveIndexFile =
            context.getInputDataFile(DataFormats.GMAP_INDEX_ZIP, sample)
                .toFile();

        // Get input file count for the sample
        // It could have one or two fastq files by sample (single end or
        // paired-end data)
        final int inFileCount =
            context.getDataFileCount(DataFormats.READS_FASTQ, sample);

        // Throw error if no reads file found.
        if (inFileCount < 1)
          throw new IOException("No reads file found.");

        // Throw error if more that 2 reads files found.
        if (inFileCount > 2)
          throw new IOException(
              "Cannot handle more than 2 reads files at the same time.");

        // Get the path to the output SAM file
        final File outSamFile =
            context.getOutputDataFile(DataFormats.MAPPER_RESULTS_SAM, sample)
                .toFile();

        // Log message for this sample
        String logMsg = "";

        // Single end mode
        if (inFileCount == 1) {

          // Get the source
          // For data format with more that one file (e.g. fastq file in
          // paired-end),
          // You must must add a third argument to context.getDataFile with the
          // number
          // of the requested file. With single end fastq the value is always 0.
          // In paired-end mode, the number of the second end is 1.
          final File inFile =
              context.getInputDataFile(FILTERED_READS_FASTQ, sample, 0)
                  .toFile();

          // Single read mapping
          mapSingleEnd(context, inFile, sample.getMetadata().getFastqFormat(),
              archiveIndexFile, outSamFile, reporter);

          logMsg =
              "Mapping reads in "
                  + sample.getMetadata().getFastqFormat() + " with Gsnap ("
                  + sample.getName() + ", " + inFile.getName() + ")";
        }

        // Paired end mode
        if (inFileCount == 2) {

          // Get the path of the first end
          // The third argument of context.getDataFile is 0 like in single end
          // mode.
          final File inFile1 =
              context.getInputDataFile(FILTERED_READS_FASTQ, sample, 0)
                  .toFile();

          // Get the path of the second end
          // The third argument of context.getDataFile is 1.
          final File inFile2 =
              context.getInputDataFile(FILTERED_READS_FASTQ, sample, 1)
                  .toFile();

          // Single read mapping
          mapPairedEnd(context, inFile1, inFile2, sample.getMetadata()
              .getFastqFormat(), archiveIndexFile, outSamFile, reporter);

          logMsg =
              "Mapping reads in "
                  + sample.getMetadata().getFastqFormat() + " with Gsnap ("
                  + sample.getName() + ", " + inFile1.getName() + ","
                  + inFile2.getName() + ")";
        }

        // Add the log message of the process of the sample to the step log
        log.append(reporter.countersValuesToString(COUNTER_GROUP, logMsg));
      }

      // Write log file
      return new StepResult(context, startTime, log.toString());

    } catch (IOException e) {

      return new StepResult(context, e, "Error while mapping: "
          + e.getMessage());
    }
  }
  • The execute() method call other methods to process data:
  // This method launch the computation in single end mode.
  private void mapSingleEnd(final Context context, final File inFile,
      final FastqFormat format, final File archiveIndexFile,
      final File outSamFile, final Reporter reporter) throws IOException {

    // Build the command line
    final String cmdArgs =
        this.mapperArguments + " " + inFile.getAbsolutePath();

    map(context, cmdArgs, format, archiveIndexFile, outSamFile, reporter);
  }

  // This method launch the computation in paired-end mode
  private void mapPairedEnd(final Context context, final File inFile1,
      final File inFile2, final FastqFormat format,
      final File archiveIndexFile, final File outSamFile,
      final Reporter reporter) throws IOException {

    // Build the command line
    final String cmdArgs =
        this.mapperArguments
            + " " + inFile1.getAbsolutePath() + " " + inFile2.getAbsolutePath();

    map(context, cmdArgs, format, archiveIndexFile, outSamFile, reporter);
  }

  private void map(final Context context, final String cmdArg,
      final FastqFormat format, final File archiveIndexFile,
      final File outSamFile, final Reporter reporter) throws IOException {

    // Extract and install the gsnap binary for eoulsan jar archive
    final String gsnapPath =
        BinariesInstaller.install("gsnap", context.getSettings()
            .getTempDirectory());

    // Get the path to the uncommpressed genome index
    final File archiveIndexDir =
        new File(archiveIndexFile.getParent(),
            StringUtils.filenameWithoutExtension(archiveIndexFile.getName()));

    // Unzip archive index if necessary
    unzipArchiveIndexFile(archiveIndexFile, archiveIndexDir);

    // Select the argument for the FASTQ format
    final String formatArg;
    switch (format) {

    case FASTQ_ILLUMINA:
      formatArg = "--quality-protocol=illumina";
      break;
    case FASTQ_ILLUMINA_1_5:
      formatArg = "--quality-protocol=illumina";
      break;
    case FASTQ_SOLEXA:
      throw new IOException("Gsnap not handle the Solexa FASTQ format.");

    case FASTQ_SANGER:
    default:
      formatArg = "--quality-protocol=sanger";
      break;
    }

    // Build the command line
    final String cmd =
        gsnapPath
            + " -A sam " + formatArg + " -t "
            + context.getSettings().getLocalThreadsNumber() + " -D "
            + archiveIndexDir.getAbsolutePath() + " -d genome " + cmdArg
            + " > " + outSamFile.getAbsolutePath() + " 2> /dev/null";

    // Log the command line to execute
    LOGGER.info(cmd);

    // Execute the command line and save the exit value
    final int exitValue = ProcessUtils.sh(cmd);

    // if the exit value is not success (0) throw an exception
    if (exitValue != 0) {
      throw new IOException("Bad error result for gsnap execution: "
          + exitValue);
    }

    // Count the number of alignment generated for the sample
    parseSAMResults(outSamFile, reporter);
  }

  // Uncompress
  private static final void unzipArchiveIndexFile(final File archiveIndexFile,
      final File archiveIndexDir) throws IOException {

    // Test if genome index file exists
    if (!archiveIndexFile.exists())
      throw new IOException("No index for the mapper found: "
          + archiveIndexFile);

    // Uncompress archive if necessary
    if (!archiveIndexDir.exists()) {

      if (!archiveIndexDir.mkdir())
        throw new IOException("Can't create directory for gmap index: "
            + archiveIndexDir);

      LOGGER.fine("Unzip archiveIndexFile "
          + archiveIndexFile + " in " + archiveIndexDir);
      FileUtils.unzip(archiveIndexFile, archiveIndexDir);
    }

    // Test if extracted directory exists
    FileUtils.checkExistingDirectoryFile(archiveIndexDir,
        "gmaps index directory");
  }

  // Count the number of alignment in a SAM file and save the result in the
  // reporter object
  private static final void parseSAMResults(final File samFile,
      final Reporter reporter) throws IOException {

    String line;

    // Parse SAM result file
    final BufferedReader readerResults =
        FileUtils.createBufferedReader(samFile);

    int entriesParsed = 0;

    while ((line = readerResults.readLine()) != null) {

      final String trimmedLine = line.trim();
      if ("".equals(trimmedLine) || trimmedLine.startsWith("@"))
        continue;

      final int tabPos = trimmedLine.indexOf('\t');

      if (tabPos != -1) {

        entriesParsed++;

        reporter.incrCounter(COUNTER_GROUP,
            MappingCounters.OUTPUT_MAPPING_ALIGNMENTS_COUNTER.counterName(), 1);
      }
    }

    readerResults.close();

    LOGGER.info(entriesParsed + " entries parsed in gsnap output file");

  }

Register the plug-in

Like many java components (JDBC, JCE, JNDI...), Eoulsan use the Service provider Interface (spi) system for its plugin system. To get a functional spi plug-in, you need a class that implements an interface (here GsnapStep implements the Step interface throw AbstractStep) and a declaration of your implementation of the interface in the metadata. To register your step in the metadata:

  • Create the src/main/java/META-INF/services directory for the metadata of the spi service.
  • Create a fr.ens.transcriptome.eoulsan.steps.Step file in this directory and add the next line to this new file:
com.example.GsnapExampleStep
  • If you have more than one step to register, add the other full class names of your steps in the next lines of the file.

Compile the plug-in

The compilation is quite simple, at the root of your project launch:

$ mvn clean install

This command line will clean the target directory before lauching the compilation. You will obtain a myeoulsanplugin-0.1-alpha-1.jar jar archive that contains your plug-in in the target directory.

Install the plugin-in

To install an Eoulsan plugin, you just have to copy the generated jar file from the target directory of your project to the lib directory of your Eoulsan installation. Your plug-in is now ready to use like the built-in steps of Eoulsan.