choice of returned histogram datatype #5

manodeep · 2017-07-19T07:20:25Z

Thanks for writing this fast histogram package. A task that we astronomers tend to do often and repeatedly :)

Coming back to the package itself, it seems to me that the return datatype of the histograms are doubles. Conceptually, the histogram is simply a count in that bin, and should be of an integral type. Is the return datatype chosen to be double to reflect the input array type? Or is that in keeping with the numpy convention?

Changing the histogram increment to be an integral type (i.e., the postfix ++ operator) might also lead to better performance. (There are some opportunity for performance improvements -- e.g., pointer operations, reduction of scope for variables, unrolled loops + separate counters)

Separately, any particular reason to perform a bitwise and (&) rather than the logical and (&&) here?

The text was updated successfully, but these errors were encountered:

astrofrog · 2017-07-19T07:47:48Z

I decided to return doubles for consistency with Numpy, but we could potentially have an option to choose the return type (at least maybe choosing between double and long?). All the other improvements sound good, and the use of && was a mistake :)

Would you be interested in submitting a pull request to implement/fix some of these as well as #4?

You mention reduction of scope for variables, but just to let you know, I found that all the variables had to be declared at the top to keep the Windows compilers happy (but maybe you know how to avoid this).

manodeep · 2017-07-19T08:37:18Z

I sent up the trivial stuff in the PR #6.

I have never compiled for Windows. What's the error you get on Windows with these lines:

const double postfac = normx * fnx;
for(i = 0; i < n; i++) {
    if (*x >= xmin && *x < xmax) {
        const long ix = (*x - xmin) * postfac;
        count[ix] += 1.;
    }
   x++;
}

astrofrog · 2017-07-19T09:07:26Z

@manodeep - Thanks! I don't have Windows locally, but you can try and open a PR, and there will be an AppVeyor build in which you can check any error messages if the build fails.

henryiii · 2018-11-01T12:28:10Z

FYI, item += 1 stops working if the numbers get large enough for floating point numbers. However, that number is very large for doubles.

henryiii · 2018-11-01T23:57:53Z

Also, numpy returns int64s:

import numpy as np
a,b = np.histogram([1,2,.3])
print(a.dtype)
# prints int64

astrofrog · 2018-11-02T13:46:04Z

I wonder if this changed in Numpy since I wrote this package. I'd certainly be up for returning integer values from the plain histogram and doubles from the weighted one, though I need to check if that might have any undesired side effects on other packages.

henryiii · 2018-11-02T18:56:19Z

I think this changed in 1.10, at least the 64 part of the int was added then. This is also when numpy got a high-speed 1D histogram algorithm. I wonder how it compares to fast-histogram now in 1D?

numpy/numpy@34b582a#diff-9d44603693b9544b900e135a203b3ae2R180, numpy/numpy#7629

Edit, per this issue numpy/numpy#7845, 2D/ND histograms still return a floating type.

astrofrog · 2018-11-02T20:23:20Z

The current benchmarks shown in the README are against a later version than 1.10, so this is still quite a bit faster than numpy. I'm not sure if you are familiar with the asv tool for benchmarks, but I keep a running benchmark here of how fast fast-histogram is compared to Numpy: https://astrofrog.github.io/fast-histogram/#benchmarks.Histogram1D.time_histogram1d?x-axis=commit

henryiii · 2018-11-04T18:42:33Z

By the way, the current Numpy code is internally in int64s, and then casts at the end. There's one line that can be dropped to stop the casting. This allows Numpy to count past the point that floating point can no longer add 1.

astrofrog added enhancement help wanted labels Jul 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choice of returned histogram datatype #5

choice of returned histogram datatype #5

manodeep commented Jul 19, 2017

astrofrog commented Jul 19, 2017

manodeep commented Jul 19, 2017

astrofrog commented Jul 19, 2017

henryiii commented Nov 1, 2018

henryiii commented Nov 1, 2018

astrofrog commented Nov 2, 2018

henryiii commented Nov 2, 2018 •

edited

Loading

astrofrog commented Nov 2, 2018

henryiii commented Nov 4, 2018

choice of returned histogram datatype #5

choice of returned histogram datatype #5

Comments

manodeep commented Jul 19, 2017

astrofrog commented Jul 19, 2017

manodeep commented Jul 19, 2017

astrofrog commented Jul 19, 2017

henryiii commented Nov 1, 2018

henryiii commented Nov 1, 2018

astrofrog commented Nov 2, 2018

henryiii commented Nov 2, 2018 • edited Loading

astrofrog commented Nov 2, 2018

henryiii commented Nov 4, 2018

henryiii commented Nov 2, 2018 •

edited

Loading