-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: BitPackedArray enforces can only be built over non-negative values #1705
Conversation
// impl PrimitiveArrayTrait for BitPackedArray { | ||
// fn ptype(&self) -> PType { | ||
// // NOTE: we use the fastlanes::BitPack provided kernels for compute with BitPackedArray, | ||
// // which is only implemented for unsigned integer types. | ||
// // As a precondition of building a new BitPackedArray, we ensure that it may only | ||
// // contain non-negative values. Thus, it is always safe to read the packed data | ||
// // reinterpreted as the unsigned variant of any integer type. | ||
// if let DType::Primitive(ptype, _) = self.dtype() { | ||
// ptype.to_unsigned() | ||
// } else { | ||
// unreachable!() | ||
// } | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the alternative to doing array.ptype().to_unsigned()
above. It'd be a bit less code and potentially less error-prone, I'm just not sure if we have/will ever have codepaths that assume that this exactly matches the DType
I fear this will hit the problem I mentioned: #1699 (comment) |
22982d2
to
5880b99
Compare
/// | ||
/// If `ptype` is signed, `packed` may **not** contain any values that once unpacked would | ||
/// be interpreted as negative. | ||
pub unsafe fn try_new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API-wise, this feels iffy. There are probably other array types that can get some inputs that create weird states. WDYT about having a try_new_unchecked
function + a checked function that does some similar validation to what we have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that is annoying is that this function really shouldn't be pub
but it has to be so that it's accessible from the sampling compressor.
Otherwise users of the public API should be going through BitPackedArray::encode
.
My goal with the unsafe was to make it clear that there are invariants that users need to check themselves for the args they're passing to the function. I don't think we can check those invariants here without fully decompressing the packed buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. WDYT about just changing it to unchecked
to keep the "convention" of try_new
as something that errors safely everywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i can do that
Following up from #1699.
In the previous PR we allowed signed arrays to be bit-packed directly. However, we did not explicitly reject arrays with negative values. We need to do this because it is critical for ensuring we have fast
search_sorted
over BitPacked data with patches, only when the patches sort to the right-most side of the array can we do efficient binary search.I've added explicit preconditions that values are non-negative, and made the BitPackedArray constructor unsafe to make it clear to callers that they must explicitly check this themselves (the recommended safe way to create a BPA is via the
BPA::encode()
method, which returns an Error if there are negative values).