Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSTs instead of nTuples #7

Open
betatim opened this issue Nov 26, 2015 · 17 comments
Open

DSTs instead of nTuples #7

betatim opened this issue Nov 26, 2015 · 17 comments

Comments

@betatim
Copy link
Member

betatim commented Nov 26, 2015

This could be an interesting topic to cover.

Lucio pointed out during the computing workshop in Paris (Nov 2015) that you could simply write a DST instead of a DecayTreeTuple from your grid jobs. The average size per event should be comparable but you can access "all the information". Once you have your DST you can then remake your nTuple as many times as you wish, potentially even locally without the grid.

I think this is a nice idea to reduce the pain of having huge nTuples with all the branches, re-running things on the grid missing some events, etc.

@kdungs
Copy link
Contributor

kdungs commented Nov 26, 2015

👍

@saschastahl
Copy link
Contributor

Yes, that would be interesting. There are two options, you make a simple filter on tour stripping line and write out all events. This is quite easy to configure. The second option is to write out a micro DST which contains only your candidate. This has a much lower event size. I always wanted to learn how to do that but never got around it. However, this is what is done in the stripping as well, so one can probably learn from that.

@apuignav
Copy link
Contributor

Yes, we could use that to teach CombineParticles, FilterDesktop and so
on... Kind of: apply your preselection on the DST and get a "preselected"
DST and then you can do whatever you want.

This reminds me of the classic Ruf presentation:
http://lhcb-reconstruction.web.cern.ch/lhcb-reconstruction/Python/Dst_as_Ntuple.pdf

On Thu, Nov 26, 2015 at 11:08 AM, Sascha Stahl [email protected]
wrote:

Yes, that would be interesting. There are two options, you make a simple
filter on tour stripping line and write out all events. This is quite easy
to configure. The second option is to write out a micro DST which contains
only your candidate. This has a much lower event size. I always wanted to
learn how to do that but never got around it. However, this is what is done
in the stripping as well, so one can probably learn from that.


Reply to this email directly or view it on GitHub
#7 (comment)
.

Dr. Albert Puig Navarro
Laboratoire de Physique des Hautes Energies
Ecole Polytechnique Fédérale de Lausanne (EPFL)
BSP 614.4 (Cubotron UNIL) CH-1015 Lausanne
EPFL Phone: 021 6939808
CERN Phone: 72518

@betatim
Copy link
Member Author

betatim commented Nov 26, 2015

The Ruf classic is what made me push so hard for the interactive DST exploring lesson.

I think doing your actual analysis on a DST is still a bit tedious though as it doesn't load fast enough. Dumping stuff into a pandas dataframe/numpy array/TTree is more agile/interactive.

So conclusion is we include this in this set of lessons? Let's decide that first before going off into dreamland of what could be.

@apuignav
Copy link
Contributor

I'm not sure, honestly... As you say, it's more cumbersome than
ntuple-format.

On Thu, Nov 26, 2015 at 12:49 PM, Tim Head [email protected] wrote:

The Ruf classic is what made me push so hard for the interactive DST
exploring lesson.

I think doing your actual analysis on a DST is still a bit tedious though
as it doesn't load fast enough. Dumping stuff into a pandas dataframe/
numpy array/TTree is more agile/interactive.

So conclusion is we include this in this set of lessons? Let's decide that
first before going off into dreamland of what could be.


Reply to this email directly or view it on GitHub
#7 (comment)
.

Dr. Albert Puig Navarro
Laboratoire de Physique des Hautes Energies
Ecole Polytechnique Fédérale de Lausanne (EPFL)
BSP 614.4 (Cubotron UNIL) CH-1015 Lausanne
EPFL Phone: 021 6939808
CERN Phone: 72518

@saschastahl
Copy link
Contributor

Yeah, I think it is a bit "smarter" than dumping everything into a tree and cutting this down.
Though it might not be more efficient.
My wishful thinking is that this workflow might expose people more to the actual event model and the algorithms. And then people would be less afraid of working with the LHCb software and contributing to it. But that is a different discussion.

@alexpearce
Copy link
Member

I recently started playing around with this. I made a µDST with the output of two Turbo lines, and then ran a DecayTreeTuple over the local output. The DTT step was super fast, and the µDST step was pretty quick as well.

For a few thousand events that contained my signal, the input µDST from the bookkeeping was 4.6 GB and the output µDST was 70 MB.

The filtering step was straight-forward for my use case, though I suspect adding things like raw banks (for flavour tagging and stuff) takes a little more care.

@pseyfert
Copy link

my 2 grumpy cents: i'd be sad to let ntuples go, as TTree:Draw is extremely powerful once you exploit that both, the variable you draw and the weight (NB: the second argument is not a cut string, it returns a float!) support bool, int and float operations:
Draw("B0_M_(B0_BKGCAT==0)+B0_P_(B0_BKGCAT>0)","parSigYield_sw*((1<<8)&TCK)")
htemp->GetMean()
not that this particular example is any use…
also RooFit cannot import DST at the moment, can it? (yes, a dst is a root ntuple, but most interfaces fail once you have more complex things than int/float/double on the branches)

@ibab
Copy link
Contributor

ibab commented Dec 17, 2015

Reading the initial post, I would say that this is meant less as "DSTs instead of nTuples" and more as "DSTs in addition to nTuples".
It allows you to be more economical when it comes to which variables you want to put in the nTuple (E.g. no need to save the signal decay tree as a matrix).
It also allows you to incorporate updates of the LHCb software and changes to your LHCb-side code into your data very quickly.

@alexpearce
Copy link
Member

@ibab is right, the (µ)DST step is meant as an intermediary to making ntuples as one would normally. The primary use case, at least for me, is rerunning my ntuple creation when I realise I'm missing some variables, or am asked to investigate things I hadn't foreseen (e.g. running DecayTreeFitter). Saving the trimmed µDST means you can rerun over them very quickly, and probably without using the Grid.

@pseyfert
Copy link

ack.

which reminds me, i think christian (rostock) once told me he'd use µDST as OO ntuple. maybe one can ask for longterm experience.

also realised another advantage: if you use µDST anyhow as ntuple, you don't have code which only works in DaVinci and code which only works on ntuples. (and then you have to translate your ntuple code once you need its output as stripping variable or relatedinfo on µDST).

@apuignav
Copy link
Contributor

However, working with µDST (which I like in principle) makes things more
difficult due to compatibility, need of software stack, etc.

I like having only ROOT because I can work on the Mac or offline, but if
there was a portable DaVinci (din;t bring up CernVM) that I could directly
use (with reasonable complication), I wouldn't see the need for ntuples :-)

BTW: this would be a DREAM come true.

@pseyfert
Copy link

so (back to topic):
we cannot (for now) remove the ntuple lessons. but it seems desirable to add a lesson for the extended workflow:
DST→(DaVinci grid)→µDST→(DaVinci local)→tuple
which is cool itself for potentially quicker turnaround cycles in the second step

and then add a lesson "cool stuff you can do with a µDST instead of writing an ntuple"

@alexpearce
Copy link
Member

Agreed! 👍

@saschastahl
Copy link
Contributor

👍

@betatim
Copy link
Member Author

betatim commented Dec 18, 2015

Not even sure we'd want to have "replace your nTuple with a uDST". Maybe
something for the lhcb-magicians-kit

On Fri, Dec 18, 2015 at 10:17 AM Alex Pearce [email protected]
wrote:

Agreed! [image: 👍]


Reply to this email directly or view it on GitHub
#7 (comment)
.

@saschastahl
Copy link
Contributor

Agreed, the title should be different. Paul summarized very well what I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants