-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds collector accumulator #378
base: develop
Are you sure you want to change the base?
Conversation
@henryiii To add a new accumulator, I currently have to change the code in many places. Not only do I need to register my accumulator in C++, but I also need to add it explicitly to src/boost_histogram/accumulators.py and src/boost_histogram/cpp/accumulators.py. It would be great to automate this. Adding things should be easy. |
@henryiii mypy fails with a wrong positive. When are we dropping Python 2 support? It is hindering this patch. |
You can disable mypy with But "Adding things should be easy." - we need to be careful - the procedure is clear and standard - if we automate too much, either with runtime magic (bad) or generation scripts (better), then that introduces more tooling to maintain, more special things unique to this one library only. Are we really planning for that many additions here? We have to recompile anyway, and we don't have this exposed as a public API for external extension modules, so keeping it a little repetitive but simple should benefit us in the long run. Now if we come up with a way to add custom additions (which should be doable for storages), then we would benefit from a generation tool, that would be a public API and should be designed as such (and then used internally, too).
With Version 1.0, probably mid-Summer. However, it is acceptable to leave off some features as Python 3 only. |
I can't follow your reasoning. The accumulators are a customization point, perhaps not for users but for us devs. When I add an accumulator, I don't want to manually change the code in several places. Why not drop Python 2 support now? 1.0 seems arbitrary. It is either dropping Python 2 or I have to rewrite my code for this patch. |
If you look into the code, you can see how I automated this. |
Any repetition in code is bad, we want to be DRY. |
It's slow and ugly, but fine for a first run. We could add easily Awkward support later. |
Randomly deciding that a feature patch should cause a major Python compatibility change is arbitrary. I have an outline and plan that has been announced and followed for about a year. Only the timing has been thrown off (mostly by COVID-19 creating an extra month of work for me). We need a roughly feature complete version (1.0), and then we can drop Python 2 support. That way, if we are picked up by experiment stacks that are stuck in Python 2, we can still be used, and we can back port fixes if needed. That's why I've put so much work into the Python 2 porting of a variety of features. If you want to wait until 1.0 is ready to merge this patch, though, that's fine with me. I can also help fix it in the near future. |
This is not an absolute rule, just a guiding principle. Also, we really aren't talking about code duplication, but rather the equivalent of definitions - it's a little irritating to list items in multiple places, but it provides static code analysis benefits - not just for MyPy, but also for code completion tools, Sphinx (which can't build the C++ code, so relies on the Python files only), and for human readers of the code. For a simplified and rather bad comparison, this is why I'm not against code duplication, but I don't like additions that break static analysis. |
@@ -1,15 +1,23 @@ | |||
# -*- coding: utf-8 -*- | |||
from __future__ import absolute_import, division, print_function | |||
|
|||
from ._core.accumulators import Sum, Mean, WeightedSum, WeightedMean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taking a very simple static list, and doing run time manipulations with a function that has more lines than the code it replaces, breaking static analysis. We are also losing any ability to not follow the specific naming scheme in the future if something different is added.
If we add unit tests for a new type here, that will immediately break if a developer forgets to update this static list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, PyBind11 is anything but DRY...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, _core
is monkey-patched for documentation, so everything in it should be explicitly imported. It is also ignored for static analysis, so there again, everything should be explicitly imported. Explicit is better than implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (unrelated to list accumulators) change is also what is breaking Python 2!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what pybind11 has to do with it, and on the contrary, it is a good example for being dry. It is even stated in their docs, that they strongly prefer minimal code to do the work. Minimal code equates avoiding redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your counter arguments make no sense to me. The code is explicit, explicit in the forwarding and transformation rules. I don't have a problem with not being able to do static analysis here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have another solution that allows me to easily add an accumulator without changing the code in several places, then go ahead. For now this is better than it was before.
We have different priorities. I consider static analyis a minor priority, because it is really not that important in this library. A good design is one, which requires changes only in one place to add a new accumulator. One of the core principles of boost::histogram is to make it easy to add new storages, axes, accumulators. I want the same to be true for boost-histogram. We have rules how the Pythonic names relate to the C++ names. These rules can be written in code. Edit: To be precise, I want it to be easy to add accumulators, axes, and storages in C++. The wrapping to Python should work largely automatic, using TMP in C++ and dynamic processing on the Python side. |
Why do we need that? Only because you wrote it in a plan? |
I don't know how to wrap this in an awkward array as a view. Without awkward, it would most naturally be represented as
array((<shape of histogram>), dtype=object)