refactor!: view should return a consistent object #459

henryiii · 2020-09-26T14:39:09Z

Currently, .view() returns two very different dtypes; "plain" dtypes / simple arrays in the case of simple storages, and structured types / View arrays in the case of accumulator storages. This makes it impossible to ducktype on the view. How about if it always returns a structured dtype? That is, a new View (SimpleView, for example) would be added:

@fields("value")
class SimpleView(View):
    ...

This way, if you want the histogram value, h.view().value works on simple storages too now, and therefore all storages that have the concept of value provide a .value. A single-field structured dtype is identical in memory to a simple dtype.

If you wanted to provide a simple output array of values, a .values() or .value()` method could be added to histogram (see the various cross-library protocol discussions). Or, you could require this as part of the Protocol (not as ideal for other libraries, but possible).

TLDR, values = h.view() is replaced by values = h.view().value, so now that works across all storage types.

@HDembinski, thoughts?

The text was updated successfully, but these errors were encountered:

HDembinski · 2020-09-28T11:34:47Z

I reject this. boost-histogram is supposed to a thin minimal wrapper around the C++ histogram and this is moving it away from that. For simple histograms with basic counters, view should return a simple array, not a structured array. I also do not want a .value() or .values() method on the histogram for reasons that I already explained. This is breaking the orthogonality of Boost.Histograms components.

henryiii · 2020-09-28T13:42:46Z

Then how do we provide a consistent API? Please write a function that takes an arbitrary storage histogram and accesses the values. Assume this is a recipe you have to give to many packages (mplhep, fitting tools., etc), so excessive use of isinstance/dtype checks are discouraged; and other histogram packages (at least @HDembinski's Uproot4) have to be able to produce identical behaviors without depending on boost-histogram. Better yet, assume that some day we add a storage (maybe #378) that has no "values" at all. The latter should correctly break the function that you write, since there are no values to access. We need this in Scikit-HEP to plot, run fitting tools, and more. Having a structured array from .view() is something that would be hard to convince everyone of, but probably not impossible, if you really didn't want to have a .value()/.values(). Also, details of the dtype should be considered mostly private, so we have to depend on attributes.

Can I also remind you that you also don't want the storage type to be queryable from Python (the method is hidden)? But you want to only have an API that changes based on the storage type? You also insisted on only having one Histogram class, rather than one for each storage, but you then want the API to depend the hidden, interior type. Even from a pure Boost-Histogram perspective, shouldn't you be able to take an existing analysis, change the storage type from Double to Weight, and then start using the new information iteratively where you need the variances, instead of having the whole analysis break because a simple array is no longer returned?

Note that Boost.Histogram doesn't live in a larger ecosystem, and C++ doesn't have a strong history of duck typing (slightly more common with templated code now, but still you still always have the underlying type defined and accessible).

Also, C++ does not return a NumPy array. The NumPy array is already providing a view of the raw array that does not exist in C++. dtype=[('value', 'f8')]) is just as valid a NumPy view of the data as is dtype="f8".

I'm not stuck on any particular solution, but I want a solution. You seem to be saying you will not accept any change that solves this key issue?

henryiii · 2020-09-28T13:54:59Z

A nice consequence: h.view().value = ... is valid Python syntax, while h.view() = ... is not.

HDembinski · 2020-09-29T03:45:52Z

I don't want a consistent API for boost-histogram. You can stop suggesting anything, because I will reject all suggestions. Any attempt to offer a uniform API is limiting the flexibility and freedom of Boost.Histogram to offer arbitrary accumulators.

You can check the type of the view and react to that in wrappers.

HDembinski · 2020-09-29T05:07:59Z

To be more specific, it is a terrible idea to try to uniform the view, which is the most direct technical access to the Accumulator. I may support adding a crude lossy generic API to the Histogram class itself. I added some thoughts on generic APIs to the issue on the generic protocol.

henryiii mentioned this issue Sep 28, 2020

Stability of histogram types scikit-hep/aghast#42

Closed

HDembinski closed this as completed Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: view should return a consistent object #459

refactor!: view should return a consistent object #459

henryiii commented Sep 26, 2020 •

edited

Loading

HDembinski commented Sep 28, 2020

henryiii commented Sep 28, 2020 •

edited

Loading

henryiii commented Sep 28, 2020

HDembinski commented Sep 29, 2020

HDembinski commented Sep 29, 2020

refactor!: view should return a consistent object #459

refactor!: view should return a consistent object #459

Comments

henryiii commented Sep 26, 2020 • edited Loading

HDembinski commented Sep 28, 2020

henryiii commented Sep 28, 2020 • edited Loading

henryiii commented Sep 28, 2020

HDembinski commented Sep 29, 2020

HDembinski commented Sep 29, 2020

henryiii commented Sep 26, 2020 •

edited

Loading

henryiii commented Sep 28, 2020 •

edited

Loading