-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drops closure-based container writers #741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
0, 0, 0, 0, 0, 0, 0, 0, 0b1000_0000 | ||
// ^-- Set the 'end' flag of the final byte to 1 | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0b1000_0000 | ||
// ^-- Set the 'end' flag of the final byte to 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The buffer here was only 9 bytes, which meant that the largest u64
values couldn't be successfully encoded as VarUInt
s. Fixes #689.
#[test] | ||
fn test_write_var_uint_for_u64_max() -> IonResult<()> { | ||
var_uint_encoding_test( | ||
u64::MAX, | ||
&[0x01, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0xFF], | ||
)?; | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Added to show that #689 is fixed.
fn as_ref(&self) -> &[Symbol] { | ||
self.symbols.as_slice() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Allows an Element
's annotations to be viewed as a slice of symbols, saving an allocation during serialization.
child_values_buffer: BumpVec<'top, u8>, | ||
parent_buffer: &'value mut BumpVec<'top, u8>, | ||
// In binary Ion 1.0, only symbol IDs can be used as annotations. | ||
annotations: Option<BumpVec<'top, SymbolId>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Prior to this patch, the container writer types existed solely to act as the argument to a closure. As such, they didn't need to own any resources; they could just hold handles to wherever the bytes needed to go.
Now that the container types are returned to the user, they need to be self-contained.
Also note that because Ion 1.0 annotations wrap the entire value, they must be carried around in the container writer until end()
is called so they can have access to all of the encoding information necessary to write the wrapper.
type MacroArgsWriter = Never; | ||
type EExpWriter = Never; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I renamed this associated type for consistency/precision.
encoder: ContainerEncodingKind<'value, 'top>, | ||
} | ||
|
||
enum ContainerEncodingKind<'value, 'top> { | ||
Delimited(DelimitedEncoder<'value, 'top>), | ||
LengthPrefixed(LengthPrefixedEncoder<'value, 'top>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ To enable containers to be either delimited or length prefixed, BinaryContainerWriter_1_1
now holds an enum of the required encoding resources instead of a pointer to a buffer.
impl<'value, 'top> FieldEncoder for BinaryStructWriter_1_1<'value, 'top> { | ||
fn encode_field_name(&mut self, name: impl AsRawSymbolTokenRef) -> IonResult<()> { | ||
use crate::RawSymbolTokenRef::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ StructWriter
impls now must also define the (private) method encode_field_name
; they can then defer value serialization to make_value_writer()
.
@@ -114,6 +104,7 @@ struct TextContainerWriter_1_0<'a, W: Write> { | |||
// The Ion type of the container using this TextContainerWriter_1_0. This value is only | |||
// used for more informative error messages. | |||
ion_type: IonType, | |||
value_delimiter: &'static str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ When StuctWriter
s go to serialize a field, they use FieldEncoder::encode_field_name(...)
and a ValueWriter
to encode it. In text, this means that the ValueWriter
is responsible for writing its trailing delimiter where applicable.
I considered adding a write_delimiter
method to StructWriter
/FieldEncoder
, but didn't like that it wouldn't be applicable to the binary side. Whether this is better though is subjective. 🤷
let (full_bytes, remaining_bits) = bits_used.div_rem(&BITS_PER_ENCODED_BYTE); | ||
match (full_bytes, remaining_bits) { | ||
(0, 0) => 1, | ||
(_, 0) => full_bytes, | ||
_ => full_bytes + 1, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could also work.
let (full_bytes, remaining_bits) = bits_used.div_rem(&BITS_PER_ENCODED_BYTE); | |
match (full_bytes, remaining_bits) { | |
(0, 0) => 1, | |
(_, 0) => full_bytes, | |
_ => full_bytes + 1, | |
} | |
(bits_used - 1) / &BITS_PER_ENCODED_BYTE + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's certainly more concise, but I have to think a bit harder to be sure it's always correct. If it's alright with you, I'd like to hold off on changing it until we know whether this needs optimization.
@@ -67,7 +64,7 @@ impl<'value, 'top> BinaryValueWriter_1_0<'value, 'top> { | |||
self.encoding_buffer.as_slice() | |||
} | |||
|
|||
pub fn write_symbol_id(&mut self, symbol_id: SymbolId) -> IonResult<()> { | |||
pub fn write_symbol_id(mut self, symbol_id: SymbolId) -> IonResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of these actually need to move self
? Is there any benefit to leaving most of them (i.e. scalar writing functions) as mutable borrows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the methods in the ValueWriter
trait consume self
to prevent users from misusing it when they go to implement WriteAsIon
. In particular, the APIs for writing annotated values and struct fields where there's metadata attached to the single value being produced.
impl WriteAsIon for MyType {
fn write_as_ion<V: ValueWriter>(&self, writer: V) -> IonResult<()> {
writer.with_annotations(&["foo", "bar])?
.write(1)?
.write(2) // Wait, what?
}
}
We could define behaviors for this type of thing, but I like that the API statically enforces the current behavior.
Note that SequenceWriter
and StructWriter
implementations take &mut self
to allow writing many nested values.
/// Closes out the sequence being written. Delimited writers can use this opportunity to emit | ||
/// a sentinel value, and length-prefixed writers can flush any buffered data to the output | ||
/// buffer. | ||
fn close(self) -> IonResult<Self::Resources>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Recent work on #732 surfaced lifetime issues in the closure-centric writer APIs for containers and e-expressions. In particular, the workarounds for the HRTBs-are-'static GATS impl limitation ended up making it impossible to write nested containers because the inner closure's lifetime was unsatisfiable.
This patch makes several simplifications to the writer API.
Writing containers and e-expressions no longer uses closures
Methods have been added to the
ValueWriter
trait that provide writers for the corresponding container/expression.For example, what was previously achieved via
write_list
:would now be done via
list_writer
:or more succinctly:
or even more succinctly, when all values are the same type:
Dropping a container without calling
end()
is illegal as it can corrupt the stream, but it is not currently enforced viapanic!
. I've opened #740 to track that.There is no longer an
AnnotatableValueWriter
traitThe
AnnotatableValueWriter
trait provided awith_annotations()
method that allowed the user to get a separateValueWriter
implementation that could handle annotations and from which it was statically impossible to try and set annotations again.ValueWriter
implementations now have awith_annotations(...)
method that returns anotherValueWriter
prepared to encode the provided symbols. Callingwith_annotations()
on aValueWriter
that already has annotations returns a newValueWriter
with the annotations replaced (it does not e.g. concatenate the annotation sequences).Accepting this behavioral change allowed the indirection and code introduced by
AnnotatableValueWriter
to be removed. The resulting API is the same for the annotated case:but the traits/types involved are simpler;
AnnotatableValueWriter
's functionality has been folded intoValueWriter
.The
WriteAsIon
andWriteAsIonValue
traits have been consolidated intoWriteAsIon
The elimination of the
AnnotatableValueWriter
trait made it possible to merge theWriteAsIonValue
trait (designed to serialize types that don't use annotations) with theWriteAsIon
trait (designed to serialize types that might use annotations).Delimited container settings are propagated to all child containers
In the Ion 1.1 binary writer API, calling
with_delimited_containers()
on aValueWriter
causes all nested containers to also use delimited containers by default. For example:This allows users to set the desired encoding mode once at the top with the expectation that that will often be what they would like to use throughout that value. This is easily overridden:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.