-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Histogram2d result does not match that of the numpy function #10
Comments
Thanks for reporting this - I think this is a literal 'edge' case where I think we must be treating points that lie on the exact edge differently, but I agree it'd probably be good to make them consistent. |
The relevant line is here: https://github.com/astrofrog/fast-histogram/blob/master/fast_histogram/_histogram_core.c#L186 - specifically here we use < instead of <= for the upper bound, otherwise the index calculation returns an out of bounds index. We could change < to <= and then special case the case where the index needs to be reduced by 1, though this would add an additional 'if' statement in the loop which could further slow things down. I'll try and investigate early next week, but in the mean time the solution is to not use exactly the same value for xmax/ymax as some of the values in the arrays being binned (for example one could add a tiny value to each of the upper limits) |
Yes, adding a small value to the upper limits for x and y does solve the problem for this example (and many others that I tested). Thank you for your quick response! |
Hi, One comment: your problem of discrepancies seems to be solved if you add a small value, for example, 0.01 to range that is shared by both np.histogram and the fast_histogram. For example as in,
This is shared by both the functions. However, I was trying to test by not modifying anything in the numpy side, that is by doing:
The result is not the same as shown below. Why do we have to pass the modified range to np.histogram as well? h_fast = h_np = |
Although increasing the range can act as a quick fix to the problem it doesn't really fix the problem. Changing the range will affect the calculated edges (bin intervals), so |
I believe that this issue can be closed now. We have decided to purposefully break with numpy on this particular point because the additional |
I agree! Before we close this I think we might want to add an entry to the FAQ. |
Thank you for this package, it is indeed very fast and I am looking forward to using it.
There is a discrepancy between your histogram2d function and the numpy equivalent, which is demonstrated below:
On my machine this outputs
h_fast =
[[ 0. 0. 1. 0. 0.]
[ 2. 1. 1. 1. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 1. 0.]
[ 0. 0. 0. 1. 1.]]
h_np =
[[ 0. 0. 1. 0. 2.]
[ 2. 1. 1. 1. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 1. 1.]
[ 0. 0. 0. 1. 2.]]
The fast_histogram result only includes 11 of the 15 points. The 4 missing points are all from the final column
The text was updated successfully, but these errors were encountered: