Skip to content

Commit

Permalink
episode4: richer Recap and Objective content
Browse files Browse the repository at this point in the history
Expands Recap section verbiage with the details about the lessons
participants followed and on the HiggsToTauTau example.

Expands Objective section with more hints for the big exercise.
  • Loading branch information
tiborsimko committed Oct 16, 2023
1 parent ec758e0 commit cf35ba5
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 30 deletions.
87 changes: 57 additions & 30 deletions _episodes/04-higgstotautau-serial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,74 +3,101 @@ title: "HiggsToTauTau analysis: serial"
teaching: 5
exercises: 20
questions:
- "Challenge: write the HiggsToTauTau analysis serial workflow and run it on REANA"
- "Challenge: write the HiggsToTauTau analysis workflow and run it on REANA"
objectives:
- "Develop a full HigssToTauTau analysis workflow using serial language"
- "Develop a full HigssToTauTau analysis workflow using a simple serial language"
- "Get acquainted with writing moderately complex REANA examples"
keypoints:
- "Writing serial workflows is like chaining shell script commands"
---

## Overview

We have practiced writing and running workflows on REANA using a simple RooFit analysis example.
In the previous two episodes we have practised writing and running workflows on REANA using a simple
RooFit analysis example.

In this lesson we shall go back to the HiggsToTauTau analysis used throughout this workshop and we
shall write a serial workflow to run the analysis on the REANA platform.
In this episode we shall go back to the HiggsToTauTau analysis example that you used throughout the
workshop and we shall write a serial workflow to run this analysis on the REANA platform.

## Recap

The past two days you have containerised HiggsToTauTau analysis by means of two GitLab repositories:
In the past two days of this workshop you have followed two lessons:

- `awesome-analysis-eventselection` with skimming and histogramming steps;
- `awesome-analysis-statistics` with the fit.
- [HSF training on CI/CD](https://hsf-training.github.io/hsf-training-cicd/)
- [HSF training on Docker](https://hsf-training.github.io/hsf-training-docker/)

You have used GitLab CI to build Docker images for these repositories such as:
The lessons were using a HiggsToTauTau example analysis described in detail here:

- `gitlab-registry.cern.ch/johndoe/awesome-analysis-eventselection`
- `gitlab-registry.cern.ch/johndoe/awesome-analysis-statistics`
- [HSF training HiggsToTauTau example analysis](https://hsf-training.github.io/hsf-training-cms-analysis-webpage/)

You have run the containerised analysis "manually" using `docker` commands such as:
You have containerised this analysis by means of two GitLab repositories:

- `gitlab.cern.ch/johndoe/awesome-analysis-eventselection` containing the skimming and histogramming;
- `gitlab.cern.ch/johndoe/awesome-analysis-statistics` containing the statistical modelling and
fitting.

You have used the GitLab CI/CD to build the Docker images for these repositories and published them as:

- `gitlab-registry.cern.ch/johndoe/awesome-analysis-eventselection:master-sha1a`
- `gitlab-registry.cern.ch/johndoe/awesome-analysis-statistics:master-sha1b`

You have run the containerised HiggsToTauTau analysis "manually" by using `docker` commands for
various analysis steps such as:

- `bash skim.sh ...`
- `bash histograms.sh ...`
- `bash plot.sh ...`
- `bash fit.sh ...`

And you have produced the plots and the fit:

<img src="{{ page.root }}/fig/awesome-analysis-serial/m_vis.png" width="60%" />

<img src="{{ page.root }}/fig/awesome-analysis-serial/fit.png" width="60%" />


## Objective

Let us now write a serial workflow how the HiggsToTauTau example can be run sequentially on REANA.
Let us write a serial computational workflow automatising the previously-run manual steps and run
the HiggsToTauTau example on REANA.

### Note: efficiency
### Note: Computing efficiency

Note that the serial workflow will not be necessarily efficient here, since it will run sequentially
over various dataset files and not process them in parallel. Do not pay attention to this
inefficiency here. We shall speed up the example via parallel processing in a forthcoming
[HiggsToTauTau analysis: parallel](../07-higgstotautau-parallel) episode coming after the coffee
break.
inefficiency here yet. We shall speed up the serial example via parallel processing in the
forthcoming [HiggsToTauTau analysis: parallel](../07-higgstotautau-parallel) episode coming after
the coffee break.

### Note: container directories and workspace directories
### Note: Container directories and workspace directories

The `awesome-analysis-eventselection` and `awesome-analysis-statistics` repositories assume that you
run code from certain absolute directories such as `/analysis/skim`. Note that when REANA starts a
new workflow run, it creates a certain unique "workspace directory" for sharing read/write files by
the workflow steps. It is a good practice to have code and data directories _readable_ and
workflow's workspace _writable_ in a clearly separated manner. In this way, the workflow won't risk
to write over the inputs or the code provided by the container, which is good both for
reproducibility purposes (inputs aren't accidentally modified) and security purposes (code is not
accidentally modified).
run code from certain absolute directories such as `/analysis/skim`. Recall that when REANA starts
a new workflow run, it creates a certain unique "workspace directory" and uses it as the default
directory for all the analysis steps throughout the workflow, allowing to share read/write files
amongst the steps.

It is a good practice to consider the absolute directories in your container images such as
`/analysis/skim` as read-only and rather use the dynamic workflow's workspace for any writeable
needs. In this way, we don't risk to write over any code or configuration files provided by the
container. This is good both for reproducibility and security purposes.

Moreover, we don't modify the size of the running container by writing inside it, as it were.
Writing to dynamic workspace that is _mounted_ inside the container allows to keep the container
size small.

### Note: REANA_WORKSPACE environment variable

REANA platform uses a convenient set of environment variables that you can use in your scripts. One
of them is `REANA_WORKSPACE` which points to the workflow's workspace which is unique for each run.
You can use `$$REANA_WORKSPACE` environment variable in your ``reana.yaml`` recipe to share the
output of skimming, histogramming, plotting and fitting steps. (Note the use of two leading dollar
signs to escape the workflow parameter expansion that we have seen previously.)
of them is `REANA_WORKSPACE` which points to the workflow's workspace which is uniquely allocated
for each run. You can use the `$$REANA_WORKSPACE` environment variable in your ``reana.yaml`` recipe
to share the output of skimming, histogramming, plotting and fitting steps. (Note the use of two
leading dollar signs to escape the workflow parameter expansion that you have used in the previous
episodes.)

### OK, challenge time!

With the above hits, please try to write workflow either individually or in pairs.
With the above hints in mind, please try to write workflow either individually or in pairs.

> ## Exercise
>
Expand Down
Binary file added fig/awesome-analysis-serial/fit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/awesome-analysis-serial/m_vis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cf35ba5

Please sign in to comment.