Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 1.1 binary reader support for typed nulls #766

Merged
merged 2 commits into from
May 28, 2024

Conversation

nirosys
Copy link
Contributor

@nirosys nirosys commented May 13, 2024

Issue #, if available: #662

Description of changes:
This PR adds support for typed nulls to the binary ion 1.1 reader.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@nirosys nirosys marked this pull request as ready for review May 13, 2024 20:03
@nirosys nirosys requested a review from zslayton May 13, 2024 20:03
@@ -48,6 +62,7 @@ impl Opcode {
(0xE, 0x0) => (IonVersionMarker, low_nibble, None),
(0xE, 0x1..=0x3) => (SymbolAddress, low_nibble, Some(IonType::Symbol)),
(0xE, 0xA) => (NullNull, low_nibble, Some(IonType::Null)),
(0xE, 0xB) => (TypedNull, low_nibble, Some(IonType::Null)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ion Type is not known yet.

Suggested change
(0xE, 0xB) => (TypedNull, low_nibble, Some(IonType::Null)),
(0xE, 0xB) => (TypedNull, low_nibble, None),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little more involved. Right now all of the data needed for the Header is expected to be available from the first byte. So, any headers created by opcodes without an IonType would cause a panic.

This does make sense, it would just probably be better as a follow-up to allow types to be deferred. I'll create an issue for it.

Comment on lines +17 to +30
pub(crate) static ION_1_1_TYPED_NULL_TYPES: &[IonType; 12] = &[
IonType::Bool,
IonType::Int,
IonType::Float,
IonType::Decimal,
IonType::Timestamp,
IonType::String,
IonType::Symbol,
IonType::Blob,
IonType::Clob,
IonType::List,
IonType::SExp,
IonType::Struct,
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there may be any benefit to using a match instead of a lookup table? E.g.:

#[inline]
fn get_type_of_null(null_body: u8) -> IonType {
    match null_body {
        0x0 => IonType::Bool,
        0x1 => IonType::Int,
        // ...
        0xB => IonType::Struct,
        _ => unreachable!(),
    }
}

If it isn't obvious which one is better and/or there is no consensus in the Rust community, then we should just create an issue to revisit match vs lookup tables at some later time. No need to go deep into a rabbit hole on this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty complicated to answer without benchmarking.

IonType is 1 byte large. So the entire LUT easily fits in a cache line (12 bytes total; cache lines are typically 64bytes). Since it is so small there's a higher likelihood that any lookups would hit the same cache line, meaning subsequent lookups are less likely to miss. If it stays in cache, it has a good chance of out-performing multiple branches.

Using a trivial test program, I got rust to generate the match in 3 different ways:

  • If the values for null_body were the same as the values used in IonType, then rust is smart enough to just return null_body and not lookup, or cmp anything. This would be awesome to make possible, but the binary encoded values would need to match the definition order of IonType (or we'd have to set the values).
  • If the values do not match, and rust isn't using -O, it can generate a jump table. This basically adds extra steps to the LUT, since it looks up offsets to jump to, then returns the IonType, instead of just looking up the IonType.
  • If the values do not match, and rust is using -O, it generates a lookup table, using the same asm as the manual LUT.

I'm guessing with the number of types we have, we're in the 'sweet spot' between "enough to justify a LUT", and "too many to use a LUT". I'd expect with a small number of options, or a large number, we'd probably see branches, but I don't know where those limits are.

It would be interesting to see what kind of benefits we'd see with having the IonType values match the values in the binary encoding for the null types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was a very thorough answer for my idle curiosity.

If the values for null_body were the same as the values used in IonType, then rust is smart enough to just return null_body and not lookup, or cmp anything. This would be awesome to make possible, but the binary encoded values would need to match the definition order of IonType (or we'd have to set the values).

Nice!

However, we already expect typed nulls to be a relatively cold path, so I guess it's probably not worth putting a lot of effort into optimizing it.

let body = self.value_body()?;
ION_1_1_TYPED_NULL_TYPES[body[0] as usize]
} else {
self.ion_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that the raw value is null but it's not using the TypedNull op code in this branch, then I think we could just return IonType::Null here. Maybe the compiler/optimizer is smart enough to figure that out, but I think that explicitly returning IonType::Null makes the meaning of the code a little more clear to the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye good call.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending Matt's comments

@nirosys nirosys force-pushed the rgiliam/1_1_binary_reader_nulls branch from 35d0894 to 6b88ad6 Compare May 28, 2024 19:39
@nirosys nirosys requested a review from popematt May 28, 2024 19:42
@nirosys nirosys merged commit caeb562 into amazon-ion:main May 28, 2024
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants