Skip to content

Latest commit

 

History

History
325 lines (270 loc) · 15.7 KB

PostProcessing.md

File metadata and controls

325 lines (270 loc) · 15.7 KB

Post-processing

← Go back to Modules Development | ↑ Go to the Table of Content ↑ | Continue to Advanced Topics →

The post-processing framework

This framework is intended for planned post-processing of objects generated by QC Tasks, Checks and correlating them with other data. The most common use-cases include correlation and trending of different properties of the detectors.

The users can write their own Post-processing Tasks or use the ones provided by the framework (see Convenience classes) which are supposed to cover the usual needs. Post-processing Tasks run asynchronously to data-taking, but can be triggered by a set of selected events.

Post-processing interface

Any Post-processing Task should inherit PostProcessingInterface, which includes four methods:

  • configure (optional) - configures the task, given its name and a configuration interface.
  • initialize - initializes the task and its data, given the event which it was triggered by.
  • update - updates the task and its data, given the event which it was triggered by.
  • finalize - finalizes the processing, given the event which it was triggered by.

Interfaces to databases and other services are accesible via ServiceRegistry, which is an argument to the last three methods. They are invoked when any of the specified triggers is up, which can be:

  • Start Of Run (SOR)
  • End Of Run (EOR)
  • Start Of Fill (SOF)
  • End Of Fill (EOF)
  • Periodic - triggers when a specified period of time passes
  • New Object - triggers when an object in QCDB is updated
  • Once - triggers only first time it is checked
  • Always - triggers each time it is checked

Triggers are complemented with timestamps which correspond the time when trigger started to be valid, in form of ms since epoch, just like in CCDB and QCDB. For example, the periodic trigger will provide evenly spaced timestamps , even if the trigger is checked more rarely. The New Object trigger provide the timestamp of the updated object . These timestamps should be used to access databases, so any Post-processing Task can be rerun with any, arbitrary timestamps.

MonitorObjects may be saved by registering them in ObjectManager, similarly to normal QC Tasks (recommended, see examples linked below), or by using DatabaseInterface directly. Please note, that created objects have to registered in ObjectManager to make them accessible by Checks.

Please refer to SkeletonPostProcessing for a minimal illustration of inheriting the interface, or to TrendingTask for a fully functional example. One can generate their own post-processing task by using the o2-qc-module-configurator helper, as described in the Module Creation chapter.

Configuration

Running the post-processing is configured in a similar manner as it is for QC Tasks and Checks - the configuration parameters are stored in a JSON file or in the Configuration database (at later development stage). The configuration's path should be passed to the application running a task.

This is a snippet of a JSON structure which configures a post-processing task:

{
  "qc": {
    "config": {
      ...
    },
    "postprocessing": {
      "MyPostProcessingTask": {
        "active": "true",
        "className": "o2::quality_control_modules::my_module::MyPPTask",
        "moduleName": "QcMyModule",
        "detectorName": "TST",
        "initTrigger": [
          "SOR"
        ],
        "updateTrigger": [
          "10mins"
        ],
        "stopTrigger": [
          "EOR",
          "10hours"
        ]
      },
      ...
    }
  }
}

Each task is identified by its name (MyPostProcessingTask). One can activate it by setting the "active" field to "true". The task is loaded given its full "className" and a "moduleName" where it is located. The "detectorName" might be used by tasks to store generated data in correct paths in QCDB. The "initTrigger", "updateTrigger" and "stopTrigger" lists contain triggers which should invoke corresponding interface methods.

Checks can be applied to the results of Post-processing Tasks just as for normal QC Tasks. However, one should use data source type of "PostProcessing" instead of "Task":

...
    "checks": {
      "ExamplePPCheck": {
        "active": "true",
        "className": "o2::quality_control_modules::skeleton::SkeletonCheck",
        "moduleName": "QcSkeleton",
        "policy": "OnAny",
        "detectorName": "TST",
        "dataSource": [{
          "type": "PostProcessing",
          "name": "ExampleTrend",
          "MOs": ["mean_of_histogram"]
        }]
      }
    },
...

Triggers configuration

Each of the three methods can be invoked by one or more triggers. Below are listed the possible options (case insensitive).

  • "sor" or "startofrun" - Start Of Run
  • "eor" or "endofrun" - End Of Run
  • "sof" or "startoffill" - Start Of Fill
  • "eof" or "endoffill" - End Of Fill
  • "<x><sec/min/hour>" - Periodic - triggers when a specified period of time passes. For example: "5min", "0.001 seconds", "10sec", "2hours".
  • "newobject:[qcdb/ccdb]:<path>" - New Object - triggers when an object in QCDB or CCDB is updated. For example : "newobject:qcdb:qc/TST/MO/QcTask/Example"
  • "once" - Once - triggers only first time it is checked
  • "always" - Always - triggers each time it is checked

Running it

The post-processing tasks can be run in three ways. First uses the usual o2-qc executable which relies on DPL and it is the only one which allows to run checks over objects generated in post-processing tasks. This is will be one of two ways to run PP tasks in production. To try it out, use it like for any other QC configuration:

o2-qc -b --config json://${QUALITYCONTROL_ROOT}/etc/postprocessing.json

All declared and active tasks in the configuration file will be run in parallel.

Debugging post-processing tasks might be easier when using the o2-qc-run-postprocessing application (only for development) or with o2-qc-run-postprocessing-occ (both development and production), as they are one-process executables, running only one, chosen task.

To run the basic example, use the command below. The --config parameter should point to the configuration file. The --period parameter specifies the time interval of checking the specified triggers (in seconds).

o2-qc-run-postprocessing --config json://${QUALITYCONTROL_ROOT}/etc/postprocessing.json --name ExamplePostprocessing --period 10

As it is configured to invoke each method only "once", you will see it initializing, entering the update method, then finalizing the task and exiting.

This executable also allows to run a Post-processing task in batch mode, i.e. with selected timestamps (see the --timestamps argument). This way, one can rerun a task over old data, if such a task actually respects given timestamps.

To have more control over the state transitions or to run a standalone post-processing task in production, one should use o2-qc-run-postprocessing-occ. It is run almost exactly as the previously mentioned application, however one has to use peanut to drive its state transitions and push the configuration.

To try it out locally, run the following in the first terminal window (we will try out a different task this time):

o2-qc-run-postprocessing-occ --name ExampleTrend --period 10

In the logs you will see a port number which listens for RPC commands. Remember it.

no control port configured, defaulting to 47100
no role configured, defaulting to default-role
gRPC server listening on port 47100

In the second window, run the following. Use the port number from the output of the QC executable.

# If you haven't built it:
# aliBuild build Coconut --defaults o2-dataflow
alienv enter coconut/latest
OCC_CONTROL_PORT=47100 peanut

A simple terminal user interface will open, which will allow you to trigger state transitions. Use it to load the configuration by entering the path to the configuration file. The usual transition sequence, which you might want to try out, is CONFIGURE, START, STOP, RESET, EXIT.

Convenience classes

We aim to provide some convenience classes which should cover the most common post-processing use-cases. Everyone is free to propose extensions to them or write their own tasks for more specific usages taking these as a starting point.

The TrendingTask class

TrendingTask is a post-processing task which uses a TTree to trend objects in the QC database and produce basic plots. The Post-processing example in the QuickStart showcases the possibilities of this class.

The following scheme shows how the class is designed. It can access data sources which are Monitor Objects and Quality Objects from the Quality Control Database - anything that is generated by other Tasks and Checks. In the future we will also support access to the CCDB.

The objects' characteristics which should be tracked are extracted by Reductors - simple plugins. The framework provides a set of Reductors for commonly used data structures, but any custom Reductor might be used as well.

All the values are stored in a TTree.Each data source forms a separate branch, with its leaves being the individual values. Additionally added columns include a time branch and a metadata branch (now consisting only of runNumber).

The TTree is stored back to the QC database each time it is updated. In addition, the class exposes the TTree::Draw interface, which allows to instantaneously generate plots with trends, correlations or histograms that are also sent to the QC database.

TrendingTask

Configuration

As this class is a post-processing task, it inherits also its configuration JSON template. It extends it, though, with two additional lists - "dataSources" and "plots":

{
  "qc": {
    ...
    "postprocessing": {
      "ExampleTrend": {
        "active": "true",
        "className": "o2::quality_control::postprocessing::TrendingTask",
        "moduleName": "QualityControl",
        "detectorName": "TST",
        "dataSources": [],
        "plots": [],
        "initTrigger": [ "once" ],
        "updateTrigger": [ "5 seconds" ],
        "stopTrigger": []
      }
    }
  }
}

Data sources are defined by filling the corresponding structure, as in the example below. For the key "type" use the value "repository" if you access a Monitor Object and "repository-quality" if that should be a Quality (this will be unified in the future). The "names" array should point to one or more objects under a common "path" in the repository. The values of "reductorName" and "moduleName" should point to a full name of a data Reductor and a library where it is located. One can use the Reductors available in the Common module or write their own by inheriting the interface class.

{
        ...
        "dataSources": [
          {
            "type": "repository",
            "path": "qc/TST/MO/QcTask",
            "names": [ "example" ],
            "reductorName": "o2::quality_control_modules::common::TH1Reductor",
            "moduleName": "QcCommon"
          },
          {
            "type": "repository-quality",
            "path": "qc/TST/QO",
            "names": [ "QcCheck" ],
            "reductorName": "o2::quality_control_modules::common::QualityReductor",
            "moduleName": "QcCommon"
          }
        ],
        ...
}

Similarly, plots are defined by adding proper structures to the "plots" list, as shown below. The plot will be stored under the "name" value and it will have the "title" value shown on the top. The "varexp", "selection" and "option" fields correspond to the arguments of the TTree::Draw method. Optionally, one can use "graphError" to add x and y error bars to a graph, as in the first plot example. The "name" and "varexp" are the only compulsory arguments, others can be omitted to reduce configuration files size.

{
        ...
        "plots": [
          {
            "name": "mean_of_histogram",
            "title": "Mean trend of the example histogram",
            "varexp": "example.mean:time",
            "selection": "",
            "option": "*L",
            "graphErrors": "5:example.stddev"
          },
          {
            "name": "histogram_of_means",
            "title": "Distribution of mean values in the example histogram",
            "varexp": "example.mean",
            "selection": "",
            "option": ""
          },
          {
            "name": "example_quality",
            "title": "Trend of the example histogram's quality",
            "varexp": "QcCheck.name:time",
            "selection": "",
            "option": "*"
          }
        ],
        ...
}

The TRFCollectionTask class

This task allows to transform a set of QualityObjects stored QCDB across certain timespan (usually for the duration of a data acquisition run) into a TimeRangeFlagCollection. It is meant to be run after for each detector/subsystem separately and when all QualityObjects for a run are generated. After generating timestamps, final data tags can be computed as the next step. The data formats for tagging data quality are described here.

The task should be run asynchronously to data-taking and should be given the start and end of a time range to process. For example:

o2-qc-run-postprocessing --config json://${QUALITYCONTROL_ROOT}/Modules/Common/etc/trfcollection-example.json \
                         --name TRFCollectionQcCheck --timestamps 1612707603626 1613999652000

The task is configured as follows:

{
  "qc": {
    "config": {
      "": "The usual global configuration variables"
    },
    "postprocessing": {
      "TRFCollectionQcCheck": {
        "active": "true",
        "className": "o2::quality_control_modules::common::TRFCollectionTask",
        "moduleName": "QcCommon",
        "detectorName": "TST",    "": "One task should concatenate Qualities from detector, defined here.",
        "initTrigger": [],        "": "The triggers can be left empty,",
        "updateTrigger": [],      "": "because we run the task with a defined set of timestamps.",
        "stopTrigger": [],
                                  "": "The list of Quality Object to process.",
        "QOs": [
          "QcCheck"
        ]
      }
    }
  }
}

TimeRangeFlagCollections are meant to be used as a base to derive Data Tags for analysis (WIP).

← Go back to Modules Development | ↑ Go to the Table of Content ↑ | Continue to Advanced Topics →