-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 1.1 binary reader support for typed nulls #766
Add 1.1 binary reader support for typed nulls #766
Conversation
@@ -48,6 +62,7 @@ impl Opcode { | |||
(0xE, 0x0) => (IonVersionMarker, low_nibble, None), | |||
(0xE, 0x1..=0x3) => (SymbolAddress, low_nibble, Some(IonType::Symbol)), | |||
(0xE, 0xA) => (NullNull, low_nibble, Some(IonType::Null)), | |||
(0xE, 0xB) => (TypedNull, low_nibble, Some(IonType::Null)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ion Type is not known yet.
(0xE, 0xB) => (TypedNull, low_nibble, Some(IonType::Null)), | |
(0xE, 0xB) => (TypedNull, low_nibble, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little more involved. Right now all of the data needed for the Header
is expected to be available from the first byte. So, any headers created by opcodes without an IonType would cause a panic.
This does make sense, it would just probably be better as a follow-up to allow types to be deferred. I'll create an issue for it.
pub(crate) static ION_1_1_TYPED_NULL_TYPES: &[IonType; 12] = &[ | ||
IonType::Bool, | ||
IonType::Int, | ||
IonType::Float, | ||
IonType::Decimal, | ||
IonType::Timestamp, | ||
IonType::String, | ||
IonType::Symbol, | ||
IonType::Blob, | ||
IonType::Clob, | ||
IonType::List, | ||
IonType::SExp, | ||
IonType::Struct, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there may be any benefit to using a match
instead of a lookup table? E.g.:
#[inline]
fn get_type_of_null(null_body: u8) -> IonType {
match null_body {
0x0 => IonType::Bool,
0x1 => IonType::Int,
// ...
0xB => IonType::Struct,
_ => unreachable!(),
}
}
If it isn't obvious which one is better and/or there is no consensus in the Rust community, then we should just create an issue to revisit match
vs lookup tables at some later time. No need to go deep into a rabbit hole on this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty complicated to answer without benchmarking.
IonType
is 1 byte large. So the entire LUT easily fits in a cache line (12 bytes total; cache lines are typically 64bytes). Since it is so small there's a higher likelihood that any lookups would hit the same cache line, meaning subsequent lookups are less likely to miss. If it stays in cache, it has a good chance of out-performing multiple branches.
Using a trivial test program, I got rust to generate the match in 3 different ways:
- If the values for
null_body
were the same as the values used inIonType
, then rust is smart enough to just returnnull_body
and not lookup, or cmp anything. This would be awesome to make possible, but the binary encoded values would need to match the definition order ofIonType
(or we'd have to set the values). - If the values do not match, and rust isn't using
-O
, it can generate a jump table. This basically adds extra steps to the LUT, since it looks up offsets to jump to, then returns the IonType, instead of just looking up the IonType. - If the values do not match, and rust is using
-O
, it generates a lookup table, using the same asm as the manual LUT.
I'm guessing with the number of types we have, we're in the 'sweet spot' between "enough to justify a LUT", and "too many to use a LUT". I'd expect with a small number of options, or a large number, we'd probably see branches, but I don't know where those limits are.
It would be interesting to see what kind of benefits we'd see with having the IonType values match the values in the binary encoding for the null types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that was a very thorough answer for my idle curiosity.
If the values for null_body were the same as the values used in IonType, then rust is smart enough to just return null_body and not lookup, or cmp anything. This would be awesome to make possible, but the binary encoded values would need to match the definition order of IonType (or we'd have to set the values).
Nice!
However, we already expect typed nulls to be a relatively cold path, so I guess it's probably not worth putting a lot of effort into optimizing it.
src/lazy/binary/raw/v1_1/value.rs
Outdated
let body = self.value_body()?; | ||
ION_1_1_TYPED_NULL_TYPES[body[0] as usize] | ||
} else { | ||
self.ion_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know that the raw value is null but it's not using the TypedNull
op code in this branch, then I think we could just return IonType::Null
here. Maybe the compiler/optimizer is smart enough to figure that out, but I think that explicitly returning IonType::Null
makes the meaning of the code a little more clear to the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye good call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending Matt's comments
35d0894
to
6b88ad6
Compare
Issue #, if available: #662
Description of changes:
This PR adds support for typed nulls to the binary ion 1.1 reader.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.