-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
choice of returned histogram datatype #5
Comments
I decided to return doubles for consistency with Numpy, but we could potentially have an option to choose the return type (at least maybe choosing between double and long?). All the other improvements sound good, and the use of Would you be interested in submitting a pull request to implement/fix some of these as well as #4? You mention reduction of scope for variables, but just to let you know, I found that all the variables had to be declared at the top to keep the Windows compilers happy (but maybe you know how to avoid this). |
I sent up the trivial stuff in the PR #6. I have never compiled for Windows. What's the error you get on Windows with these lines:
|
@manodeep - Thanks! I don't have Windows locally, but you can try and open a PR, and there will be an AppVeyor build in which you can check any error messages if the build fails. |
FYI, |
Also, numpy returns int64s: import numpy as np
a,b = np.histogram([1,2,.3])
print(a.dtype)
# prints int64 |
I wonder if this changed in Numpy since I wrote this package. I'd certainly be up for returning integer values from the plain histogram and doubles from the weighted one, though I need to check if that might have any undesired side effects on other packages. |
I think this changed in 1.10, at least the 64 part of the int was added then. This is also when numpy got a high-speed 1D histogram algorithm. I wonder how it compares to fast-histogram now in 1D? numpy/numpy@34b582a#diff-9d44603693b9544b900e135a203b3ae2R180, numpy/numpy#7629 Edit, per this issue numpy/numpy#7845, 2D/ND histograms still return a floating type. |
The current benchmarks shown in the README are against a later version than 1.10, so this is still quite a bit faster than numpy. I'm not sure if you are familiar with the asv tool for benchmarks, but I keep a running benchmark here of how fast fast-histogram is compared to Numpy: https://astrofrog.github.io/fast-histogram/#benchmarks.Histogram1D.time_histogram1d?x-axis=commit |
By the way, the current Numpy code is internally in int64s, and then casts at the end. There's one line that can be dropped to stop the casting. This allows Numpy to count past the point that floating point can no longer add 1. |
Thanks for writing this fast histogram package. A task that we astronomers tend to do often and repeatedly :)
Coming back to the package itself, it seems to me that the return datatype of the histograms are
doubles
. Conceptually, the histogram is simply a count in that bin, and should be of an integral type. Is the return datatype chosen to bedouble
to reflect the input array type? Or is that in keeping with thenumpy
convention?Changing the histogram increment to be an integral type (i.e., the postfix
++
operator) might also lead to better performance. (There are some opportunity for performance improvements -- e.g., pointer operations, reduction of scope for variables, unrolled loops + separate counters)Separately, any particular reason to perform a bitwise and (
&
) rather than the logical and (&&
) here?The text was updated successfully, but these errors were encountered: