-
Notifications
You must be signed in to change notification settings - Fork 20
LLR Workflow
Contact: [email protected]
The following contains useful notes for the running of the code when the SKIMS are already available. The process is: BigNTuples → SKIMS → Histo files → plots → limits → needed stuff in general
- BigNTuples → are the NTuples coming from CERN and containing everything
- SKIMS → they are small NTuples where only the stuff that is needed for our analysis is saved (step needed just so we do not have to deal with a big amount of memory from the BigNTuples dimension)
- Histo files → .root files containing the histograms that are then used for the plotting, limit extraction, etc.
In the following the workflow is roughly explained with a bit of detail:
Clone this repository in lxplus
. Look into branch 106X_HH_UL
, using the NtupleProducer/test/
folder.
- The datasets are in under
NtupleProducer/test/datasets_UL18.txt
. This file is picked up byNtupleProducer/test/submitAllDatasetOnCrab_LLR.py
.
ssh lxplus;
cd CMSSW_10_6_29/src/;
cmsenv;
cd LLRHiggsTauTau/NtupleProducer/test/;
source /cvmfs/cms.cern.ch/crab3/crab.sh;
python2 submitAllDatasetOnCrab_LLR.py;
- Visualize progression with Grafana (sign in below with CERN’s credentials)
- Submission outputs store under
/dpm/in2p3.fr/home/cms/trivcat/store/user/${USER}/HHNtuples_res/
Note #1: Make sure the isMC
flag is the same in NtupleProducer/test/submitAllDatasetOnCrab_LLR.py
and NtupleProducer/test/analyzer_LLR.py
.
Note #2: Common CRAB commands: crab submit
/ crab submit -d <folder>
/ crab status
To launch the production of skimmed samples we use:
bash scripts/submit_skims_UL18.sh -t <some_tag> --user bfontana
this calls scripts/makeListOnStorage.py
(input files definition) and scripts/skimNtuple.py
(skimming job submission) per sample.
The configuration file is config/skims_UL18.cfg
file (you may need to change some of its parameters).
One can add the option -n
to avoid producing input files lists if one is sure they are up-to-date. All options can be inspected with -h
. One can comment out some samples in the script to avoid skimming over all the samples.
Two text files are created (unless empty) automatically: goodfiles.txt
and badfile.txt
. There the list of “good” or “bad/corrupted” files is stored, defined according to scripts/check_outputs.py
(called within scripts/skimNtuple.py
). The list of “good” files is used in subsequent analysis steps. This additional step avoids crashes due to corrupted skimmed files.
Note: Some logic (a simple lock, transparent to the user) was implemented to ensure no race condition happens when jobs write to the same text files.
To resubmit jobs assigned to the badfiles.txt
, the user must simply issue the following command:
bash scripts/submit_skims_UL18.sh -t <some_tag> --resubmit
where the tag must be the same used in the original submission. New log files are named appropriately, so to make it clear they correspond to resubmitted jobs. Old logs are not lost. The user can resubmit the jobs as many times as needed, and the most recent badfiles.txt
(with a slightly different name) is picked as input.
When the skims are prepared we have to run the calculation of the systematics on them. The process is exactly the same, but this time we run:
source scripts/submit_syst<year>.sh
We fill the file config/sampleCfg_*.cfg
with the paths to the various directories containing the SKIM NTuples.
- The most recent file is
config/sampleCfg_UL18.cfg
We fill the selectionCfg_*.cfg
with all the selection criteria that are decided (the weights are set in this config file). The currently most up-to-date files are:
config/selectionCfg_ETau_UL18_template.cfg
config/selectionCfg_MuTau_UL18_template.cfg
config/selectionCfg_TauTau_UL18_template.cfg
We fill the mainCfg_*.cfg
file in which we have to specify:
- the samples to be included (here we have to use the aliases defined inside the
samplesCfg_*.cfg
file). The samples to be specified are data, signal and background - the variables to be plotted (these variables will have to match the variables that we are trying to plot inside the
makeFinalPlots.sh
) [e.g.tauH_pt
] - the selections to be plotted (these selections will have to match the selections that we are trying to plot inside the
makeFinalPlots.sh/py
) [i.e. baseline, s1b1jresolved, s2b0jresolved, sboostedLL] - in the
[pp_QCD]
section, the regions to be used for the QCD estimation with the ABCD method
The currently most up-to-date files are:
config/mainCfg_ETau_UL18.cfg
config/mainCfg_MuTau_UL18.cfg
config/mainCfg_TauTau_UL18.cfg
To launch the production of the histograms for all channels (example):
for i in "E" "Mu" "Tau"; do python scripts/submitHistoFiller.py --cfg config/mainCfg_${i}Tau_UL18.cfg --njobs 20 --tag <tag>; done
the number of jobs is by default set to 10. The --tag
option can take any value; it is used solely for book keeping, and it will be used during the plotting step. More jobs means more output file and quicker individual jobs (up to the availability of the cluster); 20 jobs tend to complete in around 10/15 minutes. The data used corresponds to what was defined on the sampleCfg_*.cfg
file.
The command will create a folder under /data_CMS/cms/${USER}/HHresonant_hist/<tag>/
in which the histograms (in ROOT format), logs and copies of used configuration files are stored. To modify the output folder use the --outdir
option. If no errors are present all the logs will end with:
@@ ... saving completed, closing output file
... exiting
Each job produces a ROOT file containing histograms that must be merged (see the next step).
To check if the jobs are still running or if they are done or if they broke for some reason launch one of the following:
condor_q # built-in solution, see manual
/opt/exp_soft/cms/t3/t3stat # wrapper at LLR
/opt/exp_soft/cms/t3/t3stat -q # gives only the queue
this will give a live output of the machine carrying out the jobs. Statuses:
- R: running
- Q: queueing (also known as “Idle” state)
If a job has some issue (for instance, it goes to “Hold” state) you can kill it with:
condor_rm <code_name_of_job> # cancel signal job
condor_rm -name llrt3condor <username> # cancels all jobs under username
for i in "E" "Mu" "Tau"; do python scripts/combineFillerOutputs.py --cfg mainCfg_${i}Tau_UL18.cfg --tag <tag>; done
# rm outPlotter_*.root
where --tag
is the same as the tag used in the previous step. You can use --dir
in case the outputs of the previous step were not stored in the default path.
The above command will merge all the outPlotter_*.root
files in a single one and will then create the analyzedOutPlotter.root
file in which the histograms have been analysed for the post-production of the plots.
To make the final plots we have to use makeFinalPlots.sh
:
for i in "E" "Mu" "Tau"; do bash scripts/makeFinalPlots.sh -t <tag> -c ${i}Tau -s baseline --nodata --nosig; done
where -t
points again to the same tag as before, and -c
(channel) can be “EleTau”, “MuTau” or “TauTau”. The options --nodata
or --nosig
can be added to remove the corresponding contributions from the final plots. Type -h
to see all available options.
Some variables are hard-coded, like which variables to plot (they match the variables that were specified in the mainCfg_*.cfg
). Many varables are also hard-coded in makeFinalPlots.py
.
The plots are copied from their local storage to https://${EOS_USER}}.web.cern.ch/${EOS_USER}/HH_Plots/${TAG}/${CHANNEL}/${BASELINE}/
. With some minor html/php definitions you will manage to see the plots in your browser.
mat
The limit extraction is obtained via a maximum likelihood fit. This paper summarizes well the statistical techniques employed. The combine
tool is used; you can find its documentation here.
The legacy limit extraction was done with these scripts. Currently, a single script is able to produce all the results; it runs four steps sequentially which can be launched separately if wished (see below). The script was only tested in the scope of the resonant analysis; modifications will be required for the non-resonant one (scattered scripts stored under KLUBAnalysis/nonResonantLimits
).
Inspect the run_limits.py
file, namely all the variables defined at the bottom; if all looks fine, run the following:
cd ~/CMSSW_11_3_4/src/HiggsAnalysis/CombinedLimit/ # CMSSW release used by combine v9.0.0
cmsenv # pick up all combine commands
cd ~/CMSSW_11_1_9/src/KLUBAnalysis/resonantLimits/ # go back to the KLUB limit extraction folder, in this case the resonant one; do NOT run 'cmsenv'
# adjust the variables defined inside run_limits.py
python3 run_limits.py --dryrun # remove 'dry-run' to actually run the commands
Technical note: The KLUB frameworks uses release CMSSW_11_1_9
, while combine uses a different one, depending on its version. To run the following commands (most of them depending on combine
), you have to run cmsenv
in the folder release of combine
, not on the KLUB folder
The above runs the following steps:
Run make_res_cards.sh
which calls write_res_card.py
: generates the datacards per channel/category/mass point. ABCD regions for QCD estimate are generated separately. The argument --tag
does not refer to the tag used in the histogram production; besides, it should be the same for all channels. Use the same tag throughout the limits extraction.
bash make_res_cards.sh -d UL18 --channels ETau MuTau TauTau --in_tags 10Feb_ETau_UL18 10Feb_MuTau_UL18 10Feb_TauTau_UL18 --tag <tag> --var DNNoutSM_kl_1
Generates workspaces for all possible combinations separately.
bash make_workspace_res.sh --tag <tag>
Combines datacards from the 4 categories, and generates workspaces for each channel/mass point
# you may want to adjust the variables defined inside
bash combine_res_channels.sh --tag <tag> --masses 250 260 --var DNNoutSM_kl_1 --signal ggFRadion --selections s1b1jresolvedInvMcut s2b0jresolvedInvMcut sboostedLLInvMcut
Combines datacards from all 3 channels (generally ETau, MuTau and TauTau), and generates workspaces for each category/mass point. Creates a directory called cards_<tag>_CombChan/
. Supports granular category grouping (useful when defining diHiggs mass categories, for instance) via the --selprefixes
option; all categories starting with a specific prefix are grouped together (plus grouping across all channels).
Combines all datacards for the period, and generates workspaces for each mass point. Creates a directory called cards_<tag>_All/
.
Runs combine for asymptotic limits for all channel/category/mass point separately, stores result in a log file for easy limit plotting. Can also group categories and/or channels using the --mode
option.
Plots final limits taking the log files from the previous step as input. Also supports channels and/or categories plot overlays via the --mode
option, specifically the overlay_channels
and overlay_selections
flags.