Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: BitPackedArray enforces can only be built over non-negative values #1705

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Dec 17, 2024

Following up from #1699.

In the previous PR we allowed signed arrays to be bit-packed directly. However, we did not explicitly reject arrays with negative values. We need to do this because it is critical for ensuring we have fast search_sorted over BitPacked data with patches, only when the patches sort to the right-most side of the array can we do efficient binary search.

I've added explicit preconditions that values are non-negative, and made the BitPackedArray constructor unsafe to make it clear to callers that they must explicitly check this themselves (the recommended safe way to create a BPA is via the BPA::encode() method, which returns an Error if there are negative values).

Comment on lines 244 to 257
// impl PrimitiveArrayTrait for BitPackedArray {
// fn ptype(&self) -> PType {
// // NOTE: we use the fastlanes::BitPack provided kernels for compute with BitPackedArray,
// // which is only implemented for unsigned integer types.
// // As a precondition of building a new BitPackedArray, we ensure that it may only
// // contain non-negative values. Thus, it is always safe to read the packed data
// // reinterpreted as the unsigned variant of any integer type.
// if let DType::Primitive(ptype, _) = self.dtype() {
// ptype.to_unsigned()
// } else {
// unreachable!()
// }
// }
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the alternative to doing array.ptype().to_unsigned() above. It'd be a bit less code and potentially less error-prone, I'm just not sure if we have/will ever have codepaths that assume that this exactly matches the DType

@lwwmanning
Copy link
Member

I fear this will hit the problem I mentioned: #1699 (comment)

@a10y a10y force-pushed the aduffy/bitpacked-signed-nonneg branch from 22982d2 to 5880b99 Compare December 18, 2024 18:08
@a10y a10y enabled auto-merge (squash) December 18, 2024 18:08
///
/// If `ptype` is signed, `packed` may **not** contain any values that once unpacked would
/// be interpreted as negative.
pub unsafe fn try_new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise, this feels iffy. There are probably other array types that can get some inputs that create weird states. WDYT about having a try_new_unchecked function + a checked function that does some similar validation to what we have here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that is annoying is that this function really shouldn't be pub but it has to be so that it's accessible from the sampling compressor.

Otherwise users of the public API should be going through BitPackedArray::encode.

My goal with the unsafe was to make it clear that there are invariants that users need to check themselves for the args they're passing to the function. I don't think we can check those invariants here without fully decompressing the packed buffer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. WDYT about just changing it to unchecked to keep the "convention" of try_new as something that errors safely everywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i can do that

@a10y a10y disabled auto-merge December 18, 2024 19:06
@a10y a10y enabled auto-merge (squash) December 18, 2024 20:51
@a10y a10y merged commit 18babc2 into develop Dec 18, 2024
20 checks passed
@a10y a10y deleted the aduffy/bitpacked-signed-nonneg branch December 18, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants