Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice of topics #8

Open
betatim opened this issue Nov 26, 2015 · 3 comments
Open

Choice of topics #8

betatim opened this issue Nov 26, 2015 · 3 comments

Comments

@betatim
Copy link
Member

betatim commented Nov 26, 2015

This is a meta-issue to curate a list of issues suggesting topics to cover in the second-analysis-steps material and also what to cover in an intermediate-kit. Those two don't have to be the same. (The starterkit as an event uses two repositories of material analysis-essentials and first-analysis-steps)

There are already: #7, #6 and #3

In addition an unordered list of potentially interesting topics:

  • statistics (tools for limit setting and measurement)
  • machine learning tools
  • the material we moved from first-analysis-steps to here
  • hacking the LHCb software (like Brunel or DaVinci)
  • analysis automation (snakemake and friends)
  • using the scientific python ecosystem

Please add more ideas here, or link to the issue. The aim of this issue is to keep on top of the ideas that are out there and after some discussion converge on a set of topics.

@pseyfert
Copy link

i'd like to add "everything that makes working group meetings soo frustrating because you discuss the issue for the gazillionth time".
This would be

  • what is a FoM for my selection? usually not signal-to-background. sometimes s/sqrt(s+b), sometimes one adds a tagging efficiency: e_s/sqrt(s+b). for RD one wonders (_has to*) how to normalise s wrt. b in s/sqrt(s+b), then there is the punzi FoM, what to do when you have several analysis bins?
  • how to deal with multiple candidates? is it the same as multiple PVs? does it make sense to pick a random PV? (insert reference to internal note by Patrick here)
  • is it a problem that the classifier is overtrained?

but also things like:

  • everybody knows how to code hello-world and write a for-loop with two if-statements. That's almost turing complete and they can inefficiently write inefficient code, where they could use standard libraries (may it be STL in c++ or your_favourite_python_package).
  • how do i check for memory leaks before everything explodes???
  • something from manuel's rootfit tutorial (how to implement your pdf in c++ instead of rooformula or plugging functions together, how to add analytic integrals)

or (maybe that's going to far for standard lessons, but i just realise that i want to know it myself):

  • i have two different selection and when changeing from one to the other my fit result changes by 0.3sigma. The problem is: I know the data samples are highly correlated and thus the 'real' statistical uncertainty on the difference of the fit results is much smaller than what the fitter claims. but by how much???

@ibab
Copy link
Contributor

ibab commented Dec 17, 2015

Regarding your last point, the only method I can think of is to bootstrap your data sample, apply both selections on each iteration and perform a fit on the output of each.
The 2d distribution of fit results should give you the covariance matrix.

@pseyfert
Copy link

sounds nice! (though didn't expect an immediate answer, i wanted to stay to the topic of topic selection, may i encourage you and … uhm… present it … at … the next statistics meeting?!?!)
if it's really that 'simple' it would fit an advancer kit along with other helpful stuff like 'x_lw' http://inspirehep.net/record/374024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants