-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: view should return a consistent object #459
Comments
I reject this. boost-histogram is supposed to a thin minimal wrapper around the C++ histogram and this is moving it away from that. For simple histograms with basic counters, view should return a simple array, not a structured array. I also do not want a |
Then how do we provide a consistent API? Please write a function that takes an arbitrary storage histogram and accesses the values. Assume this is a recipe you have to give to many packages (mplhep, fitting tools., etc), so excessive use of isinstance/dtype checks are discouraged; and other histogram packages (at least @HDembinski's Uproot4) have to be able to produce identical behaviors without depending on boost-histogram. Better yet, assume that some day we add a storage (maybe #378) that has no "values" at all. The latter should correctly break the function that you write, since there are no values to access. We need this in Scikit-HEP to plot, run fitting tools, and more. Having a structured array from Can I also remind you that you also don't want the storage type to be queryable from Python (the method is hidden)? But you want to only have an API that changes based on the storage type? You also insisted on only having one Histogram class, rather than one for each storage, but you then want the API to depend the hidden, interior type. Even from a pure Boost-Histogram perspective, shouldn't you be able to take an existing analysis, change the storage type from Double to Weight, and then start using the new information iteratively where you need the variances, instead of having the whole analysis break because a simple array is no longer returned? Note that Boost.Histogram doesn't live in a larger ecosystem, and C++ doesn't have a strong history of duck typing (slightly more common with templated code now, but still you still always have the underlying type defined and accessible). Also, C++ does not return a NumPy array. The NumPy array is already providing a view of the raw array that does not exist in C++. I'm not stuck on any particular solution, but I want a solution. You seem to be saying you will not accept any change that solves this key issue? |
A nice consequence: |
I don't want a consistent API for boost-histogram. You can stop suggesting anything, because I will reject all suggestions. Any attempt to offer a uniform API is limiting the flexibility and freedom of Boost.Histogram to offer arbitrary accumulators. You can check the type of the view and react to that in wrappers. |
To be more specific, it is a terrible idea to try to uniform the view, which is the most direct technical access to the Accumulator. I may support adding a crude lossy generic API to the Histogram class itself. I added some thoughts on generic APIs to the issue on the generic protocol. |
Currently,
.view()
returns two very different dtypes; "plain" dtypes / simple arrays in the case of simple storages, and structured types / View arrays in the case of accumulator storages. This makes it impossible to ducktype on the view. How about if it always returns a structured dtype? That is, a new View (SimpleView, for example) would be added:This way, if you want the histogram value,
h.view().value
works on simple storages too now, and therefore all storages that have the concept of value provide a.value
. A single-field structured dtype is identical in memory to a simple dtype.If you wanted to provide a simple output array of values, a
.values() or
.value()` method could be added to histogram (see the various cross-library protocol discussions). Or, you could require this as part of the Protocol (not as ideal for other libraries, but possible).TLDR,
values = h.view()
is replaced byvalues = h.view().value
, so now that works across all storage types.@HDembinski, thoughts?
The text was updated successfully, but these errors were encountered: