-
Notifications
You must be signed in to change notification settings - Fork 20
LLR Workflow
Contact: [email protected]
The following contains useful notes for the running of the code when the SKIMS are already available. The process is: BigNTuples → SKIMS → Histo files → plots → limits → needed stuff in general
- BigNTuples → are the NTuples coming from CERN and containing everything
- SKIMS → they are small NTuples where only the stuff that is needed for our analysis is saved (step needed just so we do not have to deal with a big amount of memory from the BigNTuples dimension)
- Histo files → .root files containing the histograms that are then used for the plotting, limit extraction, etc.
In the following the workflow is roughly explained with a bit of detail:
- See here
To launch the production of skimmed samples we use:
bash scripts/submit_skims_UL18.sh -t <some_tag>
this calls scripts/makeListOnStorage.py
(input files definition) and scripts/skimNtuple.py
(skimming job submission) per sample.
The configuration file is config/skims_UL18.cfg
file.
One can add the option -n
to avoid producing input files lists if one is sure they are up-to-date. All options can be inspected with -h
. One can comment out some samples in the script to avoid skimming over all the samples.
Two text files are created (unless empty) automatically: goodfiles.txt
and badfile.txt
. There the list of “good” or “bad/corrupted” files is stored, defined according to scripts/check_outputs.py
(called within scripts/skimNtuple.py
). The list of “good” files is used in subsequent analysis steps. This additional step avoids crashes due to corrupted skimmed files.
Note: Some logic (a simple lock, transparent to the user) was implemented to ensure no race condition happens when jobs write to the same text files.
To resubmit jobs assigned to the badfiles.txt
, the user must simply issue the following command:
bash scripts/submit_skims_UL18.sh -t <some_tag> --resubmit
where the tag must be the same used in the original submission. New log files are named appropriately, so to make it clear they correspond to resubmitted jobs. Old logs are not lost. The user can resubmit the jobs as many times as needed, and the most recent badfiles.txt
(with a slightly different name) is picked as input.
When the skims are prepared we have to run the calculation of the systematics on them. The process is exactly the same, but this time we run:
source scripts/submit_syst<year>.sh
We fill the file config/sampleCfg_*.cfg
with the paths to the various directories containing the SKIM NTuples.
- The most recent file is
config/sampleCfg_UL18.cfg
We fill the selectionCfg_*.cfg
with all the selection criteria that are decided (the weights are set in this config file). The currently most up-to-date files are:
config/selectionCfg_ETau_UL18_template.cfg
config/selectionCfg_MuTau_UL18_template.cfg
config/selectionCfg_TauTau_UL18_template.cfg
We fill the mainCfg_*.cfg
file in which we have to tag:
- the output folder in which we want the
outPlotter_*.root
files to be - the samples to be included (here we have to use the aliases that have been defined inside the
samplesCfg_*.cfg
file, because they will “work together”). The samples to be specified are: data, signal and background - the variables to be plotted (these variables will have to match the variables that we are trying to plot inside the makeFinalPlots.sh) [e.g.
tauH_pt
] - the selections to be plotted (these selections will have to match the selections that we are trying to plot inside the
makeFinalPlots.sh/py
) [i.e. baseline, s1b1jresolved, s2b0jresolved, sboostedLL] - in the [pp_QCD] section, the regions to be used for the QCD estimation with the ABCD method are specified
The currently most up-to-date files are:
config/mainCfg_ETau_UL18.cfg
config/mainCfg_MuTau_UL18.cfg
config/mainCfg_TauTau_UL18.cfg
To launch the production of the plots we have to launch the following command (example):
python scripts/submitHistoFiller.py --cfg config/mainCfg_MuTau_UL18.cfg --njobs 20 --tag <tag>
the number of jobs is by default set to 10. The --tag
option can take any value; it is used solely for book keeping. More jobs means more output file and quicker individual jobs (up to the availability of the cluster); 20 jobs tends to complete in around 10/15 minutes. The data used corresponds to what was defined on the sampleCfg_*.cfg
file.
The command will create a folder under /data_CMS/cms/${USER}/HHresonant_hist/<tag>/
in which the histograms (in ROOT format), logs and copies of used configuration files are stored. To alter the output folder use the --outdir
option. If no errors are present all the logs will end with:
@@ ... saving completed, closing output file
... exiting
Each job produces a ROOT file containing histograms that must be merged (see next step).
To check if the jobs are still running or if they are done or if they broke for some reason launch one of the following:
condor_q # built-in solution, see manual
/opt/exp_soft/cms/t3/t3stat # wrapper at LLR
/opt/exp_soft/cms/t3/t3stat -q # gives only the queue
this will give a live output of the machine carrying out the jobs. Statuses: R==running, Q==quitted. When a job is quitted it means that something went wrong and the job was aborted and is essentially dead at the moment. we can close it by running:
condor_rm <code_name_of_job> # cancel signal job
condor_rm -name llrt3condor <username> # cancels all jobs under username
python scripts/combineFillerOutputs.py --dir <output_folder>
# rm outPlotter_*.root
where ‘–dir’ should be the folder where the previous step stored the output ROOT files.
This will merge all the outPlotter_*.root
files in a single one and then the python script creates the analyzedOutPlotter.root
file in which the histograms have been analysed for the post-production of the plots.
To make the final plots we have to use makeFinalPlots.sh
In makeFinalPlots.sh
we have to specify:
- the plotter we want to use (e.g.
makeFinalPlots_UL2018.py
) - the channel we are considering and we want to plot
- the region we are considering
- the kind of selection we are using [i.e. baseline, s1b1jresolved, s2b0jresolved, sboostedLL]
- call the plotter with all the correct specs for its config (the variables to be plotted must match the variables that were specified in the
mainCfg_*.cfg
and that were produced)
In the makeFinalPlots.py
we have to be careful and take care of:
- the lists of the names of bkg and sgn histos to be matched with the names of the hosts inside the analyzedOutPlotter.root file → these are to be matched with the names in the lists of sgn and bkg in the mainCfg_*.cfg file (pay particular attention to the ‘others’ bkg because I have created a specific list for them)
- the
getHistos()
andretrieveHistos()
functions because we must ensure that they look for the right names in the right lists (pay particular attention to the ‘others’ bkg because I have created a specific list for them)
For both of them check that the directories in which we are going to save are correct and are matching:
- in .sh is the line calling the plotter
- in .py is the one where we save the file (last lines)
Launch: source scripts/makeFinalPlots.sh
and enjoy the final plots.
The limit extraction is obtained via a maximum likelihood fit on the hosts produced at the step #6. The most up to date limit extraction is done with the code in the folder limits. For this the documentation is the one on the Combine webpage.