Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of histogram with sycl kernel #2027

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

AlexanderKalistratov
Copy link
Collaborator

@AlexanderKalistratov AlexanderKalistratov commented Sep 11, 2024

Implemention of histogram with sycl_kernel.
This PR adds generic histogram kernel which can be used in the future to implement other versions of histogram such as bincount, histogram2d and histogramdd or specialize kernel for special cases like uniform bins.

sycl kernel covers only specific datatype and usm memory types. Unsupported cases are covered by additional copy.

@oleksandr-pavlyk
Copy link
Contributor

Quick validation via independent implementation:

def histogram1d_impl_tensor(data : dpt.usm_ndarray, bins : dpt.usm_ndarray) -> dpt.usm_ndarray:
    assert data.ndim == 1 
    assert bins.ndim == 1
    assert bins.shape[0] > 1
    bin_idx = dpt.searchsorted(bins, data)
    _, c = dpt.unique_counts(dpt.sort(bin_idx))
    return c

In [22]: x = dpnp.random.randn(10**7).get_array()

In [23]: bins = dpnp.asarray([-10, -4, -3, -2, -1, -0.5, -0.25, 0, 0.25, 0.5, 1, 2, 3, 4, 6], dtype=x.dtype).get_array()

In [24]: %time c, _ = dpnp.histogram(dpnp.asarray(x), bins=dpnp.asarray(bins)); print(c)
[    284   13222  214243 1359725 1497540  927356  987069  987352  927886
 1498601 1358597  214704   13114     307]
CPU times: user 10.6 ms, sys: 10.4 ms, total: 21 ms
Wall time: 15.2 ms

In [25]: %time c = histogram1d_impl_tensor(x, bins=bins); print(c)
[    284   13222  214243 1359725 1497540  927356  987069  987352  927886
 1498601 1358597  214704   13114     307]
CPU times: user 711 ms, sys: 581 ms, total: 1.29 s
Wall time: 396 ms

@AlexanderKalistratov
Copy link
Collaborator Author

@antonwolfy

dpnp/dpnp_iface_histograms.py Show resolved Hide resolved
dpnp/dpnp_iface_histograms.py Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/CMakeLists.txt Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram_common.hpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.hpp Outdated Show resolved Hide resolved
dpnp/dpnp_iface_histograms.py Outdated Show resolved Hide resolved
@oleksandr-pavlyk
Copy link
Contributor

I think this is a bug:


In [4]: data, edges = dpt.concat([dpt.full(10**7, fill_value=2., dtype="f"), dpt.full(10**7, fill_value=1., dtype="f"), dpt.full(10**7, fill_value=4., dtype="f")]), dpt.asarray([-2,1, 2, 4], dtype="f")

In [5]: dpnp.histogram(data, edges)
Out[5]:
(array([       0, 10000000, 20000000]),
 usm_ndarray([-2.,  1.,  2.,  4.], dtype=float32))

In [6]: dpnp.histogram(data, edges, density=True)
Out[6]:
(array([0.       , 0.3333333, 0.3333333], dtype=float32),
 usm_ndarray([-2.,  1.,  2.,  4.], dtype=float32))

The density should be (0, 1/3, 2/3), instead of (0, 1/3, 1/3).

@AlexanderKalistratov
Copy link
Collaborator Author

@oleksandr-pavlyk it is not a bug. Numpy demonstrates the same behavior:

>>> import numpy
>>> data, edges = numpy.concatenate([numpy.full(10**7, fill_value=2., dtype="f"), numpy.full(10**7, fill_value=1., dtype="f"), numpy.full(10**7, fill_value=4., dtype="f")]), numpy.asarray([-2,1, 2, 4], dtype="f")
>>> numpy.histogram(data, edges)
(array([       0, 10000000, 20000000]), array([-2.,  1.,  2.,  4.], dtype=float32))
>>> numpy.histogram(data, edges, density=True)
(array([0.        , 0.33333333, 0.33333333]), array([-2.,  1.,  2.,  4.], dtype=float32))

dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
dpnp/backend/extensions/sycl_ext/histogram.cpp Outdated Show resolved Hide resolved
uint32_t max_local_copies = local_mem_size / bins_count;
uint32_t local_hist_count = std::max(
std::min(
int(std::ceil((float(4 * local_size) / bins_count))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to place any comment or named constexpr variables to clarify the meaning of constants?

{
static bool isnan(const T &v)
{
if constexpr (std::is_floating_point<T>::value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if constexpr (std::is_floating_point<T>::value) {
if constexpr (std::is_floating_point_v<T>) {

return arr->get_queue() != exec_q;
});

if (unequal_queue != arrays.cend()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional not to use the utils functions from dpctl here (like dpctl::utils::queues_are_compatible)? (to print parameter name also in case of check failure)

Comment on lines +362 to +350
if hist_dtype == dpnp.complex128:
a_bin_dtype = dpnp.float64
elif hist_dtype == dpnp.float64:
a_bin_dtype = dpnp.complex128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that do the same like in above block 357-360?

has_fp64 = device.has_aspect_fp64
a_bin_dtype = _result_type_for_device(a_dtype, bins_dtype, device)

supported_types = (dpnp.float32, dpnp.int64, numpy.uint64, dpnp.complex64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be helpful to return a mapper function based on ContigFactory? Like it's done for ufunc, and then it will only need to check if we can cast the input arrays to expected matching dtype if any. And it will return the dtype of result histogram array we have to allocate.

if hist_dtype == numpy.uint64:
hist_dtype = dpnp.int64

if (a_bin_dtype in float_types and hist_dtype in float_types) or (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be else if?

Suggested change
if (a_bin_dtype in float_types and hist_dtype in float_types) or (
elif (a_bin_dtype in float_types and hist_dtype in float_types) or (

# host usm memory
n_usm_type = "device" if usm_type == "host" else usm_type

n_casted = dpnp.zeros(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fill the memory with zeros?

Suggested change
n_casted = dpnp.zeros(
n_casted = dpnp.empty(

a_usm,
bins_usm,
weights_usm,
n_usm,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that n_casted is dpnp.ndarray, so we can use get_array method:

Suggested change
n_usm,
n_casted.get_array(),


#include "histogram.hpp"

PYBIND11_MODULE(_sycl_ext_impl, m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the name of the extension (and the folder containing the implementation) into something less generic, like histogram, or bin_counting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to add other functions as correlate to the same module.

Probably it is bad idea, but i'm not sure that idea to make new extension module for each implemented function (like having separate module just for correlate) is good either.

So I would rely on your and @antonwolfy opinion in this question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it would be good to rename the extension something like statistics to group by functional block.
Moreover considering that new kernels will have a similar implementation approach.

db = dpnp.diff(bin_edges).astype(dpnp.default_float_type())
db = dpnp.diff(bin_edges).astype(
dpnp.default_float_type(sycl_queue=queue)
)
return n / db / n.sum(), bin_edges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps n.sum() should be replaced with dpnp.sum(n).

@@ -52,7 +52,6 @@ repos:
rev: 24.4.2
hooks:
- id: black
args: ["--check", "--diff", "--color"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, was this change intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants