Multiple storages #893

wiso · 2023-10-24T15:26:56Z

wiso
Oct 24, 2023

Hello, I guess this discussion should be done within c++ boost histograms, but this community seems more active.

I work in ATLAS performances and many of my workflows are similar to:

axis = hist.axis.Regular(...)
df.groupby([
  axis.index(df['var1'])
])['quantity'].agg([myfunction1, myfunction2])

df = df.reindex(range(len(axis)))  # to be sure I have all the bins

it would be very nice to be able to do that with boost histograms.

Is it possible to have multiple storages? I would like to loop only one time on my data and to store different estimators of the same quantity (e.g. number of instances, mean, and std, ...)
If myfunction1 is size and myfunction2 is mean then these two would be equivalent to building a normal histogram or a profile histogram, which are already supported. What if I want a different estimator (e.g. the max)? What about estimators that need to have all the data in memory (e.g. quantiles)?

I imagine something like this

h = Hist.hist(axis, accumulators={
    'sum': Sum, 'profile': Mean, 'resolution': Std, 'myfunction': LambdaAccumulator(myfunction)
})
h.fill(df['quantity'])
h.sum.plot()
h.myfunction.plot()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple storages #893

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Multiple storages #893

wiso Oct 24, 2023

Replies: 0 comments

wiso
Oct 24, 2023