Replies: 1 comment
-
For now, you have to be aware of this. There are ways to accelerate filling of histograms with growing axes in Boost.Histogram on the C++ level, but it is a complicated patch. Since boost-histogram does not use a dict to store values, it is slower on the filling, but faster on operations performed on the filled histogram than coffea. Ideally, you always want both things to be as efficient as possible, but here one has to decide on a trade-off. coffea made one choice while we another one. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently migrating from Coffea histograms to Boost histograms. And as I expected, Boost histograms generally turned out to be faster. However, in one case they were much slower. This happens when I try to add two histograms, one with a lot of categories in their growing category axis and another one with a different number of categories. See the script below for an example. For me this is a very common situation because this is what I do when accumulating histograms. There, the small histogram is the result of one particular chunk that was processed on the cluster and the big one is the result of the already accumulated histograms.
In the example below Coffea manages to add the histograms around 30 times faster than boost_histogram. Coffea manages category axes by using dicts to store the counts. Is this something one generally has to be aware of when using growing axes or could this be improved?
Python version 3.8.6
boost_histogram version 1.3.1
Coffea version 0.7.13
Beta Was this translation helpful? Give feedback.
All reactions