Choice of topics #8

betatim · 2015-11-26T10:14:45Z

This is a meta-issue to curate a list of issues suggesting topics to cover in the second-analysis-steps material and also what to cover in an intermediate-kit. Those two don't have to be the same. (The starterkit as an event uses two repositories of material analysis-essentials and first-analysis-steps)

There are already: #7, #6 and #3

In addition an unordered list of potentially interesting topics:

statistics (tools for limit setting and measurement)
machine learning tools
the material we moved from first-analysis-steps to here
hacking the LHCb software (like Brunel or DaVinci)
analysis automation (snakemake and friends)
using the scientific python ecosystem

Please add more ideas here, or link to the issue. The aim of this issue is to keep on top of the ideas that are out there and after some discussion converge on a set of topics.

The text was updated successfully, but these errors were encountered:

pseyfert · 2015-12-17T21:01:01Z

i'd like to add "everything that makes working group meetings soo frustrating because you discuss the issue for the gazillionth time".
This would be

what is a FoM for my selection? usually not signal-to-background. sometimes s/sqrt(s+b), sometimes one adds a tagging efficiency: e_s/sqrt(s+b). for RD one wonders (_has to*) how to normalise s wrt. b in s/sqrt(s+b), then there is the punzi FoM, what to do when you have several analysis bins?
how to deal with multiple candidates? is it the same as multiple PVs? does it make sense to pick a random PV? (insert reference to internal note by Patrick here)
is it a problem that the classifier is overtrained?

but also things like:

everybody knows how to code hello-world and write a for-loop with two if-statements. That's almost turing complete and they can inefficiently write inefficient code, where they could use standard libraries (may it be STL in c++ or your_favourite_python_package).
how do i check for memory leaks before everything explodes???
something from manuel's rootfit tutorial (how to implement your pdf in c++ instead of rooformula or plugging functions together, how to add analytic integrals)

or (maybe that's going to far for standard lessons, but i just realise that i want to know it myself):

i have two different selection and when changeing from one to the other my fit result changes by 0.3sigma. The problem is: I know the data samples are highly correlated and thus the 'real' statistical uncertainty on the difference of the fit results is much smaller than what the fitter claims. but by how much???

ibab · 2015-12-17T21:10:44Z

Regarding your last point, the only method I can think of is to bootstrap your data sample, apply both selections on each iteration and perform a fit on the output of each.
The 2d distribution of fit results should give you the covariance matrix.

pseyfert · 2015-12-17T22:17:54Z

sounds nice! (though didn't expect an immediate answer, i wanted to stay to the topic of topic selection, may i encourage you and … uhm… present it … at … the next statistics meeting?!?!)
if it's really that 'simple' it would fit an advancer kit along with other helpful stuff like 'x_lw' http://inspirehep.net/record/374024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choice of topics #8

Choice of topics #8

betatim commented Nov 26, 2015

pseyfert commented Dec 17, 2015

ibab commented Dec 17, 2015

pseyfert commented Dec 17, 2015

Choice of topics #8

Choice of topics #8

Comments

betatim commented Nov 26, 2015

pseyfert commented Dec 17, 2015

ibab commented Dec 17, 2015

pseyfert commented Dec 17, 2015