Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Null support for hash-based groupby #460

Closed

Conversation

jrhemstad
Copy link
Contributor

This PR was migrated from the old repo: rapidsai/libgdf#140

It adds support for NULL values to the hash-based groupby implementation.

The build_aggregation_table kernel was updated such that if a row is invalid, then it is not inserted into the hash table.

…ad/libgdf into fea-ext-null-suport-hash-groupby
…re ignored, unless all of the values for a given key are NULL.
…rnel to rearrange the validity bits of the aggregation output column after sorting the results.
…ut validity mask during the extraction kernel.
…f the AVG aggregator will get set correctly.
…ion output can never be Null. Instead, an aggregation of all NULLs for Count will just return 0.
@jrhemstad jrhemstad added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. labels Dec 6, 2018
@harrism harrism changed the base branch from master to branch-0.5 December 7, 2018 00:00
@jrhemstad jrhemstad self-assigned this Dec 10, 2018
@jrhemstad
Copy link
Contributor Author

Closing due to discussions with @kkraus14 and @williamBlazing about what the desired behavior is for nulls with groupby. Null support will be re-added once groupby has been refactored for single-pass multi-aggregator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant