diff --git a/_episodes/04-higgstotautau-serial.md b/_episodes/04-higgstotautau-serial.md index 8da7346..c127188 100644 --- a/_episodes/04-higgstotautau-serial.md +++ b/_episodes/04-higgstotautau-serial.md @@ -3,9 +3,9 @@ title: "HiggsToTauTau analysis: serial" teaching: 5 exercises: 20 questions: -- "Challenge: write the HiggsToTauTau analysis serial workflow and run it on REANA" +- "Challenge: write the HiggsToTauTau analysis workflow and run it on REANA" objectives: -- "Develop a full HigssToTauTau analysis workflow using serial language" +- "Develop a full HigssToTauTau analysis workflow using a simple serial language" - "Get acquainted with writing moderately complex REANA examples" keypoints: - "Writing serial workflows is like chaining shell script commands" @@ -13,64 +13,91 @@ keypoints: ## Overview -We have practiced writing and running workflows on REANA using a simple RooFit analysis example. +In the previous two episodes we have practised writing and running workflows on REANA using a simple +RooFit analysis example. -In this lesson we shall go back to the HiggsToTauTau analysis used throughout this workshop and we -shall write a serial workflow to run the analysis on the REANA platform. +In this episode we shall go back to the HiggsToTauTau analysis example that you used throughout the +workshop and we shall write a serial workflow to run this analysis on the REANA platform. ## Recap -The past two days you have containerised HiggsToTauTau analysis by means of two GitLab repositories: +In the past two days of this workshop you have followed two lessons: -- `awesome-analysis-eventselection` with skimming and histogramming steps; -- `awesome-analysis-statistics` with the fit. +- [HSF training on CI/CD](https://hsf-training.github.io/hsf-training-cicd/) +- [HSF training on Docker](https://hsf-training.github.io/hsf-training-docker/) -You have used GitLab CI to build Docker images for these repositories such as: +The lessons were using a HiggsToTauTau example analysis described in detail here: -- `gitlab-registry.cern.ch/johndoe/awesome-analysis-eventselection` -- `gitlab-registry.cern.ch/johndoe/awesome-analysis-statistics` +- [HSF training HiggsToTauTau example analysis](https://hsf-training.github.io/hsf-training-cms-analysis-webpage/) -You have run the containerised analysis "manually" using `docker` commands such as: +You have containerised this analysis by means of two GitLab repositories: + +- `gitlab.cern.ch/johndoe/awesome-analysis-eventselection` containing the skimming and histogramming; +- `gitlab.cern.ch/johndoe/awesome-analysis-statistics` containing the statistical modelling and + fitting. + +You have used the GitLab CI/CD to build the Docker images for these repositories and published them as: + +- `gitlab-registry.cern.ch/johndoe/awesome-analysis-eventselection:master-sha1a` +- `gitlab-registry.cern.ch/johndoe/awesome-analysis-statistics:master-sha1b` + +You have run the containerised HiggsToTauTau analysis "manually" by using `docker` commands for +various analysis steps such as: - `bash skim.sh ...` - `bash histograms.sh ...` - `bash plot.sh ...` - `bash fit.sh ...` +And you have produced the plots and the fit: + + + + + + ## Objective -Let us now write a serial workflow how the HiggsToTauTau example can be run sequentially on REANA. +Let us write a serial computational workflow automatising the previously-run manual steps and run +the HiggsToTauTau example on REANA. -### Note: efficiency +### Note: Computing efficiency Note that the serial workflow will not be necessarily efficient here, since it will run sequentially over various dataset files and not process them in parallel. Do not pay attention to this -inefficiency here. We shall speed up the example via parallel processing in a forthcoming -[HiggsToTauTau analysis: parallel](../07-higgstotautau-parallel) episode coming after the coffee -break. +inefficiency here yet. We shall speed up the serial example via parallel processing in the +forthcoming [HiggsToTauTau analysis: parallel](../07-higgstotautau-parallel) episode coming after +the coffee break. -### Note: container directories and workspace directories +### Note: Container directories and workspace directories The `awesome-analysis-eventselection` and `awesome-analysis-statistics` repositories assume that you -run code from certain absolute directories such as `/analysis/skim`. Note that when REANA starts a -new workflow run, it creates a certain unique "workspace directory" for sharing read/write files by -the workflow steps. It is a good practice to have code and data directories _readable_ and -workflow's workspace _writable_ in a clearly separated manner. In this way, the workflow won't risk -to write over the inputs or the code provided by the container, which is good both for -reproducibility purposes (inputs aren't accidentally modified) and security purposes (code is not -accidentally modified). +run code from certain absolute directories such as `/analysis/skim`. Recall that when REANA starts +a new workflow run, it creates a certain unique "workspace directory" and uses it as the default +directory for all the analysis steps throughout the workflow, allowing to share read/write files +amongst the steps. + +It is a good practice to consider the absolute directories in your container images such as +`/analysis/skim` as read-only and rather use the dynamic workflow's workspace for any writeable +needs. In this way, we don't risk to write over any code or configuration files provided by the +container. This is good both for reproducibility and security purposes. + +Moreover, we don't modify the size of the running container by writing inside it, as it were. +Writing to dynamic workspace that is _mounted_ inside the container allows to keep the container +size small. ### Note: REANA_WORKSPACE environment variable REANA platform uses a convenient set of environment variables that you can use in your scripts. One -of them is `REANA_WORKSPACE` which points to the workflow's workspace which is unique for each run. -You can use `$$REANA_WORKSPACE` environment variable in your ``reana.yaml`` recipe to share the -output of skimming, histogramming, plotting and fitting steps. (Note the use of two leading dollar -signs to escape the workflow parameter expansion that we have seen previously.) +of them is `REANA_WORKSPACE` which points to the workflow's workspace which is uniquely allocated +for each run. You can use the `$$REANA_WORKSPACE` environment variable in your ``reana.yaml`` recipe +to share the output of skimming, histogramming, plotting and fitting steps. (Note the use of two +leading dollar signs to escape the workflow parameter expansion that you have used in the previous +episodes.) ### OK, challenge time! -With the above hits, please try to write workflow either individually or in pairs. +With the above hints in mind, please try to write workflow either individually or in pairs. > ## Exercise > diff --git a/fig/awesome-analysis-serial/fit.png b/fig/awesome-analysis-serial/fit.png new file mode 100644 index 0000000..439a824 Binary files /dev/null and b/fig/awesome-analysis-serial/fit.png differ diff --git a/fig/awesome-analysis-serial/m_vis.png b/fig/awesome-analysis-serial/m_vis.png new file mode 100644 index 0000000..9ed3697 Binary files /dev/null and b/fig/awesome-analysis-serial/m_vis.png differ