-
Notifications
You must be signed in to change notification settings - Fork 43
[REVIEW] NULL support for hash-based group by #140
base: master
Are you sure you want to change the base?
[REVIEW] NULL support for hash-based group by #140
Conversation
…ad/libgdf into fea-ext-null-suport-hash-groupby
…re ignored, unless all of the values for a given key are NULL.
…omicOrs to avoid race conditions.
…rnel to rearrange the validity bits of the aggregation output column after sorting the results.
…ut validity mask during the extraction kernel.
…f the AVG aggregator will get set correctly.
Are there any tests that should be added here? Or is everything covered by existing tests? |
Code LGTM otherwise. Thanks for putting this in @jrhemstad! |
… isn't allocated.
…ion output can never be Null. Instead, an aggregation of all NULLs for Count will just return 0.
…f build aggregation table.
This requires changes on the PyGDF side in making sure to allocate validity buffers for all columns so please coordinate before merging. |
… Count value before doing the division.
@kkraus14 is this handled in pygdf now? can we merge? |
@nsakharnykh I believe there's still issues with binaryops if we always always allocate a validity mask, please hold off for the time being. |
@kkraus14 ping, are there still issues with binaryops? |
@nsakharnykh just pulled in latest changes and running tests now. I know sort based groupbys don't handle validity masks properly, but I believe binaryops were actually okay. I apologize for being mistaken earlier. |
@nsakharnykh This PR needs to be merged first: rapidsai/cudf#246 Which will always allocate validity masks for you so you won't encounter a situation where validity masks aren't allocated and you need to create one. |
This PR adds support for NULL values to the hash-based groupby implementation.
The
build_aggregation_table
kernel was updated such that if a row is invalid, then it is not inserted into the hash table.