Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

[REVIEW] NULL support for hash-based group by #140

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

jrhemstad
Copy link
Contributor

@jrhemstad jrhemstad commented Sep 15, 2018

This PR adds support for NULL values to the hash-based groupby implementation.

The build_aggregation_table kernel was updated such that if a row is invalid, then it is not inserted into the hash table.

@jrhemstad jrhemstad changed the title [REVIEW] [REVIEW] NULL support for hash-based group by Sep 15, 2018
@kkraus14 kkraus14 added the 3 - Ready for Review Ready for review by team label Sep 17, 2018
@jrhemstad jrhemstad closed this Sep 17, 2018
@jrhemstad jrhemstad reopened this Sep 17, 2018
@scopatz
Copy link

scopatz commented Sep 20, 2018

Are there any tests that should be added here? Or is everything covered by existing tests?

@scopatz
Copy link

scopatz commented Sep 20, 2018

Code LGTM otherwise. Thanks for putting this in @jrhemstad!

@kkraus14
Copy link
Contributor

This requires changes on the PyGDF side in making sure to allocate validity buffers for all columns so please coordinate before merging.

@nsakharnykh
Copy link
Contributor

This requires changes on the PyGDF side in making sure to allocate validity buffers for all columns so please coordinate before merging.

@kkraus14 is this handled in pygdf now? can we merge?

@kkraus14
Copy link
Contributor

@nsakharnykh I believe there's still issues with binaryops if we always always allocate a validity mask, please hold off for the time being.

@nsakharnykh
Copy link
Contributor

@kkraus14 ping, are there still issues with binaryops?

@kkraus14
Copy link
Contributor

@nsakharnykh just pulled in latest changes and running tests now. I know sort based groupbys don't handle validity masks properly, but I believe binaryops were actually okay. I apologize for being mistaken earlier.

@kkraus14
Copy link
Contributor

@nsakharnykh This PR needs to be merged first: rapidsai/cudf#246 Which will always allocate validity masks for you so you won't encounter a situation where validity masks aren't allocated and you need to create one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants