DSTs instead of nTuples #7

betatim · 2015-11-26T07:29:59Z

This could be an interesting topic to cover.

Lucio pointed out during the computing workshop in Paris (Nov 2015) that you could simply write a DST instead of a DecayTreeTuple from your grid jobs. The average size per event should be comparable but you can access "all the information". Once you have your DST you can then remake your nTuple as many times as you wish, potentially even locally without the grid.

I think this is a nice idea to reduce the pain of having huge nTuples with all the branches, re-running things on the grid missing some events, etc.

kdungs · 2015-11-26T09:50:36Z

👍

saschastahl · 2015-11-26T10:08:27Z

Yes, that would be interesting. There are two options, you make a simple filter on tour stripping line and write out all events. This is quite easy to configure. The second option is to write out a micro DST which contains only your candidate. This has a much lower event size. I always wanted to learn how to do that but never got around it. However, this is what is done in the stripping as well, so one can probably learn from that.

apuignav · 2015-11-26T10:24:24Z

Yes, we could use that to teach CombineParticles, FilterDesktop and so
on... Kind of: apply your preselection on the DST and get a "preselected"
DST and then you can do whatever you want.

This reminds me of the classic Ruf presentation:
http://lhcb-reconstruction.web.cern.ch/lhcb-reconstruction/Python/Dst_as_Ntuple.pdf

On Thu, Nov 26, 2015 at 11:08 AM, Sascha Stahl [email protected]
wrote:

Yes, that would be interesting. There are two options, you make a simple
filter on tour stripping line and write out all events. This is quite easy
to configure. The second option is to write out a micro DST which contains
only your candidate. This has a much lower event size. I always wanted to
learn how to do that but never got around it. However, this is what is done
in the stripping as well, so one can probably learn from that.

—
Reply to this email directly or view it on GitHub
#7 (comment)
.

Dr. Albert Puig Navarro
Laboratoire de Physique des Hautes Energies
Ecole Polytechnique Fédérale de Lausanne (EPFL)
BSP 614.4 (Cubotron UNIL) CH-1015 Lausanne
EPFL Phone: 021 6939808
CERN Phone: 72518

betatim · 2015-11-26T11:49:33Z

The Ruf classic is what made me push so hard for the interactive DST exploring lesson.

I think doing your actual analysis on a DST is still a bit tedious though as it doesn't load fast enough. Dumping stuff into a pandas dataframe/numpy array/TTree is more agile/interactive.

So conclusion is we include this in this set of lessons? Let's decide that first before going off into dreamland of what could be.

apuignav · 2015-11-26T13:32:57Z

I'm not sure, honestly... As you say, it's more cumbersome than
ntuple-format.

On Thu, Nov 26, 2015 at 12:49 PM, Tim Head [email protected] wrote:

The Ruf classic is what made me push so hard for the interactive DST
exploring lesson.

I think doing your actual analysis on a DST is still a bit tedious though
as it doesn't load fast enough. Dumping stuff into a pandas dataframe/
numpy array/TTree is more agile/interactive.

So conclusion is we include this in this set of lessons? Let's decide that
first before going off into dreamland of what could be.

—
Reply to this email directly or view it on GitHub
#7 (comment)
.

Dr. Albert Puig Navarro
Laboratoire de Physique des Hautes Energies
Ecole Polytechnique Fédérale de Lausanne (EPFL)
BSP 614.4 (Cubotron UNIL) CH-1015 Lausanne
EPFL Phone: 021 6939808
CERN Phone: 72518

saschastahl · 2015-11-26T14:07:30Z

Yeah, I think it is a bit "smarter" than dumping everything into a tree and cutting this down.
Though it might not be more efficient.
My wishful thinking is that this workflow might expose people more to the actual event model and the algorithms. And then people would be less afraid of working with the LHCb software and contributing to it. But that is a different discussion.

alexpearce · 2015-11-30T09:02:42Z

I recently started playing around with this. I made a µDST with the output of two Turbo lines, and then ran a DecayTreeTuple over the local output. The DTT step was super fast, and the µDST step was pretty quick as well.

For a few thousand events that contained my signal, the input µDST from the bookkeeping was 4.6 GB and the output µDST was 70 MB.

The filtering step was straight-forward for my use case, though I suspect adding things like raw banks (for flavour tagging and stuff) takes a little more care.

pseyfert · 2015-12-17T20:41:09Z

my 2 grumpy cents: i'd be sad to let ntuples go, as TTree:Draw is extremely powerful once you exploit that both, the variable you draw and the weight (NB: the second argument is not a cut string, it returns a float!) support bool, int and float operations:
Draw("B0_M_(B0_BKGCAT==0)+B0_P_(B0_BKGCAT>0)","parSigYield_sw*((1<<8)&TCK)")
htemp->GetMean()
not that this particular example is any use…
also RooFit cannot import DST at the moment, can it? (yes, a dst is a root ntuple, but most interfaces fail once you have more complex things than int/float/double on the branches)

ibab · 2015-12-17T20:51:59Z

Reading the initial post, I would say that this is meant less as "DSTs instead of nTuples" and more as "DSTs in addition to nTuples".
It allows you to be more economical when it comes to which variables you want to put in the nTuple (E.g. no need to save the signal decay tree as a matrix).
It also allows you to incorporate updates of the LHCb software and changes to your LHCb-side code into your data very quickly.

alexpearce · 2015-12-18T08:08:44Z

@ibab is right, the (µ)DST step is meant as an intermediary to making ntuples as one would normally. The primary use case, at least for me, is rerunning my ntuple creation when I realise I'm missing some variables, or am asked to investigate things I hadn't foreseen (e.g. running DecayTreeFitter). Saving the trimmed µDST means you can rerun over them very quickly, and probably without using the Grid.

pseyfert · 2015-12-18T08:49:52Z

ack.

which reminds me, i think christian (rostock) once told me he'd use µDST as OO ntuple. maybe one can ask for longterm experience.

also realised another advantage: if you use µDST anyhow as ntuple, you don't have code which only works in DaVinci and code which only works on ntuples. (and then you have to translate your ntuple code once you need its output as stripping variable or relatedinfo on µDST).

apuignav · 2015-12-18T08:53:20Z

However, working with µDST (which I like in principle) makes things more
difficult due to compatibility, need of software stack, etc.

I like having only ROOT because I can work on the Mac or offline, but if
there was a portable DaVinci (din;t bring up CernVM) that I could directly
use (with reasonable complication), I wouldn't see the need for ntuples :-)

BTW: this would be a DREAM come true.

pseyfert · 2015-12-18T09:15:00Z

so (back to topic):
we cannot (for now) remove the ntuple lessons. but it seems desirable to add a lesson for the extended workflow:
DST→(DaVinci grid)→µDST→(DaVinci local)→tuple
which is cool itself for potentially quicker turnaround cycles in the second step

and then add a lesson "cool stuff you can do with a µDST instead of writing an ntuple"

alexpearce · 2015-12-18T09:17:35Z

Agreed! 👍

saschastahl · 2015-12-18T09:18:25Z

👍

betatim · 2015-12-18T09:18:47Z

Not even sure we'd want to have "replace your nTuple with a uDST". Maybe
something for the lhcb-magicians-kit

On Fri, Dec 18, 2015 at 10:17 AM Alex Pearce [email protected]
wrote:

Agreed! [image: 👍]

—
Reply to this email directly or view it on GitHub
#7 (comment)
.

saschastahl · 2015-12-18T09:27:38Z

Agreed, the title should be different. Paul summarized very well what I had in mind.

betatim mentioned this issue Nov 26, 2015

Choice of topics #8

Open

alexpearce mentioned this issue Jun 1, 2016

Rerun s21 over s20 MC #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSTs instead of nTuples #7

DSTs instead of nTuples #7

betatim commented Nov 26, 2015

kdungs commented Nov 26, 2015

saschastahl commented Nov 26, 2015

apuignav commented Nov 26, 2015

betatim commented Nov 26, 2015

apuignav commented Nov 26, 2015

saschastahl commented Nov 26, 2015

alexpearce commented Nov 30, 2015

pseyfert commented Dec 17, 2015

ibab commented Dec 17, 2015

alexpearce commented Dec 18, 2015

pseyfert commented Dec 18, 2015

apuignav commented Dec 18, 2015

pseyfert commented Dec 18, 2015

alexpearce commented Dec 18, 2015

saschastahl commented Dec 18, 2015

betatim commented Dec 18, 2015

saschastahl commented Dec 18, 2015

DSTs instead of nTuples #7

DSTs instead of nTuples #7

Comments

betatim commented Nov 26, 2015

kdungs commented Nov 26, 2015

saschastahl commented Nov 26, 2015

apuignav commented Nov 26, 2015

betatim commented Nov 26, 2015

apuignav commented Nov 26, 2015

saschastahl commented Nov 26, 2015

alexpearce commented Nov 30, 2015

pseyfert commented Dec 17, 2015

ibab commented Dec 17, 2015

alexpearce commented Dec 18, 2015

pseyfert commented Dec 18, 2015

apuignav commented Dec 18, 2015

pseyfert commented Dec 18, 2015

alexpearce commented Dec 18, 2015

saschastahl commented Dec 18, 2015

betatim commented Dec 18, 2015

saschastahl commented Dec 18, 2015