BHEP: FlowArray #504

henryiii · 2021-02-12T15:27:20Z

henryiii
Feb 12, 2021
Maintainer

Boost-Histogram Enhancement Proposal: (the only time you'll see me capitalize the H for the Python version ;) )

I've been working on this idea for a while, originally came up from a discussion with @jpivarski. This potentially could simplify quite a bit, pulling common logic into a single interface. Since it doesn't actually invalidate any of the current external API, it doesn't have to be be added in the 0.x series (which I want to finish very soon). I think this might also help with #498 - there's less pressure on to_numpy to be the one and only way to get axis edges that match the .view (which it's not, it returns NumPy edges, not boost-histogram ones).

We introduce a new FlowArray class, a subclass of ndarray, which has the following properties:

It behaves exactly like a flow=False ndarray, and returns an ndarray for any operation, except for the points below.
It has a __call__ operator, with a single optional keyword argument; if you call it with flow=True, it gives you the full ndarray with flow.
It supports assignment using the same auto-expansion rules as Histograms for flow (optional, but could potentially allow us to drop even more custom logic from Histogram)
We base View on this, since View's always have Flow. (see point 5)
The class itself obviously understands if flow bins are available in the underlying storage for each dimension; if you have one sided flow, etc. This indexing logic is already in Histogram, it would just be moved to FlowArray (and moved from the C++ wrapper to pure Python). Return values from the wrapper become simpler.

All of the histogram methods, h.view, h.values, etc would become properties that return a FlowArray; such that h.view() continues to work exactly the same way, and returns a normal ndarray. All of the axis operations return FlowArrays now too, such that ax.edges continues to work, though the edge is a FlowArray rather than a vanilla ndarray. ax.edges() would be a vanilla ndarary. Due to point 1. above, this should have no effect.

This would tie the concept of "flow" to FlowArray, and pull it out of Histogram and the C++ Histogram wrapper, providing better separation of concepts. It also would be unit testable without having to make histograms and test each buffer return value separately. It would allow edges (or any other axis operation) to be explicitly synced with view:

flow = True
edges = h.axes.edges(flow=flow)
values = h.values(flow=flow)
variances = h.variances(flow=flow)

Or you could get the FlowArrays and pick flow later:

edges = h.axes.edges
values = h.values
variances = h.variances

One other benefit is that the following can now be simplified:

h.view()[...] = array
# Now can be written
h.view = array

Due to point 3 above, this would fully support all possible flow=True or flow=False assignments, since the decision is made based on the RHS shape, just like the current Histogram supports it!

One possible future modification to the above could be to change the FlowArray to allow Ufuncts to apply to all values, and return a FlowArray. But this could be added later without changing the valid uses.

The hardest part of this is actually implementing it, but we've already paved the way with View, and the logic for this is already there as well, just spread around (mostly in the C++ wrapper code). Like the current system, it holds the memory for the whole array, it just uses indexing to present the flow=False view of it.

@HDembinski, thoughts?

HDembinski · 2021-02-13T13:15:01Z

HDembinski
Feb 13, 2021
Maintainer

I think this creates a new ambiguity, should a ufunc operate on the flow bins or not. It is harmless for transformations, you want them to operate on all bins equally, but for reductions like np.sum, it is potentially a huge difference whether flow bins are included or not.

It is also deprecating the existing interface:

h.axes.edges(flow=flow)
[...]

6 replies

HDembinski Apr 9, 2021
Maintainer

Ok, I didn't get the trick of the __call__ operator on the FlowArray. Yes, this is nice, except for the fact that using __call__ to get the flow stuff is awkward. It may be better to break the interface.

HDembinski Apr 9, 2021
Maintainer

It is fairly disappointing that this good idea did not came up before when we tried to come up with the design for values, variances, etc.

HDembinski Apr 9, 2021
Maintainer

For 1.x we could add the __call__ but give a FutureWarning that it will disappear in 2.x.

henryiii Apr 9, 2021
Maintainer Author

I think we are too far along to break the interface, and this really does work nicely when used like a function, h.values(flow=True) and h.axes.edges(flow=True). Also, being able to pass flow in as a variable is nice, h.values(flow=flow), while if it was a property, like h.values.withflow, you would then not be able to easily control it from one shared variable. Most users can simply ignore the fact that this is really a FlowArray and not a normal ndarray, which is kind of the point.

Note that uproot uses h.axes.edges(), so this also allows us to be API compatible with uproot histograms.

henryiii Apr 9, 2021
Maintainer Author

We can't break UHI's PlottableHistogram Protocol, which removing __call__ would do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BHEP: FlowArray #504

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

BHEP: FlowArray #504

henryiii Feb 12, 2021 Maintainer

Replies: 1 comment · 6 replies

HDembinski Feb 13, 2021 Maintainer

HDembinski Apr 9, 2021 Maintainer

HDembinski Apr 9, 2021 Maintainer

HDembinski Apr 9, 2021 Maintainer

henryiii Apr 9, 2021 Maintainer Author

henryiii Apr 9, 2021 Maintainer Author

henryiii
Feb 12, 2021
Maintainer

Replies: 1 comment 6 replies

HDembinski
Feb 13, 2021
Maintainer

HDembinski Apr 9, 2021
Maintainer

HDembinski Apr 9, 2021
Maintainer

HDembinski Apr 9, 2021
Maintainer

henryiii Apr 9, 2021
Maintainer Author

henryiii Apr 9, 2021
Maintainer Author