-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds additional tooling APIs to enable Ion 1.1 support in the CLI's inspect
command
#802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR Tour 🧭
let (remaining_data, stream_offset, mut encoding) = match &self.encoding_reader { | ||
Text_1_0(r) => r.stream_data(), | ||
Binary_1_0(r) => r.stream_data(), | ||
Text_1_1(r) => r.stream_data(), | ||
Binary_1_1(r) => r.stream_data(), | ||
let reader_state = match &self.encoding_reader { | ||
Text_1_0(r) => r.save_state(), | ||
Binary_1_0(r) => r.save_state(), | ||
Text_1_1(r) => r.save_state(), | ||
Binary_1_1(r) => r.save_state(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I renamed stream_data
to save_state
to eliminate the noun/verb ambiguity Matt noticed in the feedback for #796.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to find the comment: #796 (comment)
Doc comment would be helpful because it's not immediately obvious whether "stream" is a verb or a noun here.
It's a little ironic because "save state" is nearly as ambiguous as "stream data". Both "save" and "stream" may be adjectives or verbs. "save state" is clearer though, and has the correct implication interpreted either way. I like the change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a noun, consider data_stream
, saved_state
. If it's a verb, consider save_reader_state
maybe?
Or even get_reader_state()
? (Or, if it accepts owned self
, then maybe into_reader_state()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the name stream_data
was noun/verb ambiguous, I think the actual problem being solved was that the name didn't explain what it did. I think save_state
addresses that, even if it remains noun/verb ambiguous.
@@ -1149,10 +1153,12 @@ impl<'top> LazyContainerPrivate<'top, AnyEncoding> for LazyRawAnyList<'top> { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Copy, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Lots of structs now implement Copy
et al not only because they can/should, but also because it makes implementing peek()
possible for raw containers/readers.
if total_length > input.len() { | ||
return IonResult::incomplete( | ||
"the stream ended unexpectedly in the middle of a value", | ||
header_offset, | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Somehow the test for incomplete binary values never made the leap from 1.0 to 1.1, so here it is.
@@ -3,9 +3,12 @@ | |||
use std::fmt::Debug; | |||
use std::ops::Range; | |||
|
|||
use num_traits::PrimInt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ cargo fmt
moved this from below.
@@ -823,3 +825,35 @@ impl<'top> LazyRawBinaryValue_1_1<'top> { | |||
Ok(LazyRawBinaryStruct_1_1::from_value(self)) | |||
} | |||
} | |||
|
|||
impl<'top> EncodedBinaryValue<'top, BinaryEncoding_1_1> for &'top LazyRawBinaryValue_1_1<'top> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The definition of this trait is in the next file down in the PR. For now, just notice that these methods are exposing chunks of the input backing various encoding primitives in the value's bytes.
pub struct RawReaderState<'a> { | ||
data: &'a [u8], | ||
offset: usize, | ||
encoding: IonEncoding, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I created a type for this instead of passing around a 3-tuple.
// Unlike a symbol value with identifier syntax, an e-expression identifier cannot be | ||
// the last thing in the stream. | ||
return Err(nom::Err::Incomplete(Needed::Unknown)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Fixing an incompleteness detection bug that occurs when the last thing in the buffer is the e-exp ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you!
))(self) | ||
} | ||
|
||
/// Like `match_digits_before_dot`, but allows leading zeros. | ||
fn match_digits_after_dot(self) -> IonMatchResult<'top> { | ||
fn match_one_or_more_digits_after_dot(self) -> IonMatchResult<'top> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Fixing an incompleteness detection bug that occurs when the e
in a float
is the last thing in the buffer.
expect_incomplete: [ | ||
"5d", | ||
"-5d", | ||
"5.d", | ||
"-5.d", | ||
"5.D", | ||
"-5.D", | ||
"5.1d", "-5.1d", | ||
"5.1D", "-5.1D", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ All of these could be valid decimals if the buffer ends up getting at least one more digit.
let buffer_tail = &input.bytes()[input.len() - tail_bytes_to_take..]; | ||
let tail = format!("{:X?}", buffer_tail); | ||
(head, tail) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good!
src/lazy/binary/raw/v1_1/value.rs
Outdated
// Nonsense values for now | ||
ion_type_code: OpcodeType::Nop, | ||
low_nibble: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now? Until when?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent (not communicated in this comment) was to modify the EncodedValue
type such that it wouldn't require a Header
anymore. Matt also suggested this independently; I'm going to do it as part of #805.
/// Returns `true` if this expression was produced by evaluating a macro. Otherwise, returns `false`. | ||
pub fn is_ephemeral(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚲 🏠 is_ephemeral
doesn't say "is the result of evaluating a macro" to me, but I am ignorant of the standard terminology here. Is this phrasing used elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using it in ion-rust
and ion inspect
to describe values that don't appear anywhere in the input stream. They exist only during macro evaluation. I'm open to other ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on "don't appear anywhere in the input stream"? An example, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
Here's an example in which the stream contains:
- An encoding directive (
$ion_encoding::(...)
) that defines a macro calledtriple
- An e-expression with a string argument (
(:triple "hello")
)
When a Reader
traverses the stream it will hand out three LazyValue
s, one for each "hello"
in the stream's expansion. However, if you were to ask any of those LazyValue
s for its Span
(backing bytes), you would get None
--the values aren't encoded as values the stream itself, they're the ephemeral result of evaluating the e-expression.
Does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example :) Yes that does help. If the triple
macro produced only one value then you could say that that is encoded where the macro is invoked (as you describe in the comment below about "singleton macros".
Hmm, I could be inclined to think of those "hello"
values as occupying the same 0-byte span between bytes... 73 and 74? But then I'd have to wonder- how do I distinguish the position of the second from the first? If I'm able to distinguish the difference between multiple values in the same place then I might as well say all three values are "in" the macro invocation. Is the answer then that we need another dimension to locate values, that all three "hello"
are located between bytes 67-74 at different ordinals? E.g. ([67-74], 0)
, ([67-74], 1)
, ([67-74], 2)
?
Are these values ephemeral? Virtual? Implicit? Derived? Computed?
Lots of questions. No suggestions :) Works for me.
src/lazy/expanded/macro_evaluator.rs
Outdated
_ => unreachable!("e-expression-backed lazy values must yield a single value literal"), | ||
_ => ice!(IonResult::decoding_error(format!( | ||
"expansion of {self:?} was required to produce exactly one value", | ||
))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is an optimization available to macros that are statically determined to expand to a single non-system value at template-compile time. Consider this macro definition:
(macro Point2D ($x $y)
{
x: $x,
y: $y,
}
)
When the TemplateCompiler
processes this, it notices that the body of the macro is an unannotated struct. This means that it will always evaluate to exactly one value and that value cannot be a symbol table or encoding directive (because those require specific annotations).
The codebase refers to this category of macros as "singleton macros." They offer a few benefits:
- Because it will definitely be exactly one value, the reader can hand out a
LazyValue
holding the e-expression as its backing data. Other macros cannot do this because if you're holding aLazyValue
and the macro evaluates to nothing or 100 things, there's not a way for the application to handle those outcomes. - At the top level, singleton macros can be skipped over without evaluation because we know they won't produce a system value that could modify the encoding context.
- Expanding a singleton macro doesn't require an evaluator with a stack because as soon as you've gotten a value, you're done--no need to
pop()
and preserve state.
Now that we're looking at this more closely though, the decoding error should actually be a panic
--if it ever happens, it's a bug in the library and not a problem with the data. While I'm in there I'll also add some of the description above to the ExpansionSingleton
type.
src/lazy/system_reader.rs
Outdated
new_macro_table.macro_with_id(3), | ||
new_macro_table.macro_with_id(4), | ||
new_macro_table.macro_with_name("seventeen") | ||
); | ||
assert_eq!( | ||
new_macro_table.macro_with_id(4), | ||
new_macro_table.macro_with_id(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these change because of the addition of annotate
? I see the assertion on new line 1097 doesn't change. The comment on new line 1095 is now out of date though, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these change because of the addition of annotate?
Yep. Matt rightly pointed out that I should use an offset from NUM_SYSTEM_MACROS
to prevent future breakage. I'm going to do that for #807.
The comment on new line 1095 is now out of date though, yes?
Yeah, good catch.
// Unlike a symbol value with identifier syntax, an e-expression identifier cannot be | ||
// the last thing in the stream. | ||
return Err(nom::Err::Incomplete(Needed::Unknown)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you!
// Note: ^-- We use this `pair(satisfy(...), complete_digit0)` to guarantee a subtle | ||
// behavior. At the end of the buffer, an empty input to this parser must be | ||
// considered 'incomplete' instead of 'invalid'. In contrast, an input of a single | ||
// digit would be considered complete even though the buffer could get more data later. | ||
// (If the buffer gets more data, it's the StreamingRawReader's responsibility to | ||
// discard the `1.1` and try again.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much appreciate comments like these, thanks!
let buffer_tail = &input.bytes()[input.len() - tail_bytes_to_take..]; | ||
let tail = format!("{:X?}", buffer_tail); | ||
(head, tail) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good!
This PR:
EncodedBinaryValue
, with methods for exposingSpan
s representing the various components of a value's encoding. This trait is implemented by the 1.0 and 1.1 binary value types. These methods allowion inspect
to handle both versions' values uniformly.peek
methods to raw container iterators andMacroExpansion
, allowing tools to see the next value without advancing state.ExpandedStreamItem
enum that surfaces inputEExp
s in addition to their expansion. This allows tools (likeinspect
to investigate theEExp
and then investigate its output values.ExpandedValueSource::EExp
toSingletonEExp
. If aLazyValue
is backed by anEExp
, it's because it was found to always expand to a single value at compile time--that makes it possible to hand out aLazyValue
knowing that when it is eventually read there will be exactly one value. Other values can come from anEExp
, but are not backed by it. The renaming hopes to make that less fuzzy.is_ephemeral
,is_parameter
, etc.It also fixes three bugs for detecting incomplete input:
EExp
IDs were being parsed using theidentifier
symbol parser. Unlike a symbol, anEExp
's ID cannot be the last thing in a stream.float
parser correctly rejected input like123e
(no digits aftere
), but it did it by declaring itinvalid
rather thanincomplete
. If the buffer received more data, it's possible retrying would succeed.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.