Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drops closure-based container writers #741

Merged
merged 9 commits into from
Apr 18, 2024
Merged

Drops closure-based container writers #741

merged 9 commits into from
Apr 18, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Apr 10, 2024

Recent work on #732 surfaced lifetime issues in the closure-centric writer APIs for containers and e-expressions. In particular, the workarounds for the HRTBs-are-'static GATS impl limitation ended up making it impossible to write nested containers because the inner closure's lifetime was unsatisfiable.

This patch makes several simplifications to the writer API.


Writing containers and e-expressions no longer uses closures

Methods have been added to the ValueWriter trait that provide writers for the corresponding container/expression.

trait ValueWriter: ... {
  // ...

  fn list_writer(&mut self) -> IonResult<Self::ListWriter>;
  fn sexp_writer(&mut self) -> IonResult<Self::SExpWriter>;
  fn struct_writer(&mut self) -> IonResult<Self::StructWriter>;
  fn eexp_writer(&mut self) -> IonResult<Self::EExpWriter>;
}

For example, what was previously achieved via write_list:

writer.write_list(|list| {
  list
    .write(1)?
    .write(2)?
    .write(3)?;
  Ok(())
})

would now be done via list_writer:

let mut list = writer.list_writer()?;
list
  .write(1)?
  .write(2)?
  .write(3)?;
list.end()

or more succinctly:

let mut list = writer.list_writer()?;
list.write_all(&[1, 2, 3])?;
list.end()

or even more succinctly, when all values are the same type:

writer.write_list([1, 2, 3])?;

Dropping a container without calling end() is illegal as it can corrupt the stream, but it is not currently enforced via panic!. I've opened #740 to track that.


There is no longer an AnnotatableValueWriter trait

The AnnotatableValueWriter trait provided a with_annotations() method that allowed the user to get a separate ValueWriter implementation that could handle annotations and from which it was statically impossible to try and set annotations again.

ValueWriter implementations now have a with_annotations(...) method that returns another ValueWriter prepared to encode the provided symbols. Calling with_annotations() on a ValueWriter that already has annotations returns a new ValueWriter with the annotations replaced (it does not e.g. concatenate the annotation sequences).

Accepting this behavioral change allowed the indirection and code introduced by AnnotatableValueWriter to be removed. The resulting API is the same for the annotated case:

writer
  .value_writer()
  .with_annotations(&["foo", "bar", "baz"])
  .write_bool(true)?;

but the traits/types involved are simpler; AnnotatableValueWriter's functionality has been folded into ValueWriter.


The WriteAsIon and WriteAsIonValue traits have been consolidated into WriteAsIon

The elimination of the AnnotatableValueWriter trait made it possible to merge the WriteAsIonValue trait (designed to serialize types that don't use annotations) with the WriteAsIon trait (designed to serialize types that might use annotations).


Delimited container settings are propagated to all child containers

In the Ion 1.1 binary writer API, calling with_delimited_containers() on a ValueWriter causes all nested containers to also use delimited containers by default. For example:

// Create a delimited StructWriter
let mut struct_ = writer
  .value_writer()
  .with_delimited_containers()
  .struct_writer()?;

// Write a nested delimited list
struct_
  .write("foo", &["bar", "baz"])?; // This list will be delimited

struct_ .end()

This allows users to set the desired encoding mode once at the top with the expectation that that will often be what they would like to use throughout that value. This is easily overridden:

// Create a delimited StructWriter
let mut struct_ = writer
  .value_writer()
  .with_delimited_containers()
  .struct_writer()?;

// Write a nested length-prefixed list
struct_
  field_writer()
    .with_length_prefixed_containers()
    .write("foo", &["bar", "baz"])? // This list will be length-prefixed
  .end()

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

Comment on lines -45 to +59
0, 0, 0, 0, 0, 0, 0, 0, 0b1000_0000
// ^-- Set the 'end' flag of the final byte to 1
0, 0, 0, 0, 0, 0, 0, 0, 0, 0b1000_0000
// ^-- Set the 'end' flag of the final byte to 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The buffer here was only 9 bytes, which meant that the largest u64 values couldn't be successfully encoded as VarUInts. Fixes #689.

Comment on lines +140 to +147
#[test]
fn test_write_var_uint_for_u64_max() -> IonResult<()> {
var_uint_encoding_test(
u64::MAX,
&[0x01, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0x7F, 0xFF],
)?;
Ok(())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Added to show that #689 is fixed.

fn as_ref(&self) -> &[Symbol] {
self.symbols.as_slice()
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Allows an Element's annotations to be viewed as a slice of symbols, saving an allocation during serialization.

child_values_buffer: BumpVec<'top, u8>,
parent_buffer: &'value mut BumpVec<'top, u8>,
// In binary Ion 1.0, only symbol IDs can be used as annotations.
annotations: Option<BumpVec<'top, SymbolId>>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Prior to this patch, the container writer types existed solely to act as the argument to a closure. As such, they didn't need to own any resources; they could just hold handles to wherever the bytes needed to go.

Now that the container types are returned to the user, they need to be self-contained.

Also note that because Ion 1.0 annotations wrap the entire value, they must be carried around in the container writer until end() is called so they can have access to all of the encoding information necessary to write the wrapper.

Comment on lines -380 to +283
type MacroArgsWriter = Never;
type EExpWriter = Never;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I renamed this associated type for consistency/precision.

src/lazy/encoder/binary/v1_0/value_writer.rs Outdated Show resolved Hide resolved
Comment on lines +24 to +30
encoder: ContainerEncodingKind<'value, 'top>,
}

enum ContainerEncodingKind<'value, 'top> {
Delimited(DelimitedEncoder<'value, 'top>),
LengthPrefixed(LengthPrefixedEncoder<'value, 'top>),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ To enable containers to be either delimited or length prefixed, BinaryContainerWriter_1_1 now holds an enum of the required encoding resources instead of a pointer to a buffer.

Comment on lines +314 to +316
impl<'value, 'top> FieldEncoder for BinaryStructWriter_1_1<'value, 'top> {
fn encode_field_name(&mut self, name: impl AsRawSymbolTokenRef) -> IonResult<()> {
use crate::RawSymbolTokenRef::*;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ StructWriter impls now must also define the (private) method encode_field_name; they can then defer value serialization to make_value_writer().

@@ -114,6 +104,7 @@ struct TextContainerWriter_1_0<'a, W: Write> {
// The Ion type of the container using this TextContainerWriter_1_0. This value is only
// used for more informative error messages.
ion_type: IonType,
value_delimiter: &'static str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ When StuctWriters go to serialize a field, they use FieldEncoder::encode_field_name(...) and a ValueWriter to encode it. In text, this means that the ValueWriter is responsible for writing its trailing delimiter where applicable.

I considered adding a write_delimiter method to StructWriter/FieldEncoder, but didn't like that it wouldn't be applicable to the binary side. Whether this is better though is subjective. 🤷

@zslayton zslayton marked this pull request as ready for review April 10, 2024 17:27
@zslayton zslayton requested review from popematt and nirosys April 10, 2024 17:27
Comment on lines +39 to +44
let (full_bytes, remaining_bits) = bits_used.div_rem(&BITS_PER_ENCODED_BYTE);
match (full_bytes, remaining_bits) {
(0, 0) => 1,
(_, 0) => full_bytes,
_ => full_bytes + 1,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could also work.

Suggested change
let (full_bytes, remaining_bits) = bits_used.div_rem(&BITS_PER_ENCODED_BYTE);
match (full_bytes, remaining_bits) {
(0, 0) => 1,
(_, 0) => full_bytes,
_ => full_bytes + 1,
}
(bits_used - 1) / &BITS_PER_ENCODED_BYTE + 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly more concise, but I have to think a bit harder to be sure it's always correct. If it's alright with you, I'd like to hold off on changing it until we know whether this needs optimization.

@@ -67,7 +64,7 @@ impl<'value, 'top> BinaryValueWriter_1_0<'value, 'top> {
self.encoding_buffer.as_slice()
}

pub fn write_symbol_id(&mut self, symbol_id: SymbolId) -> IonResult<()> {
pub fn write_symbol_id(mut self, symbol_id: SymbolId) -> IonResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of these actually need to move self? Is there any benefit to leaving most of them (i.e. scalar writing functions) as mutable borrows?

Copy link
Contributor Author

@zslayton zslayton Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the methods in the ValueWriter trait consume self to prevent users from misusing it when they go to implement WriteAsIon. In particular, the APIs for writing annotated values and struct fields where there's metadata attached to the single value being produced.

impl WriteAsIon for MyType {
    fn write_as_ion<V: ValueWriter>(&self, writer: V) -> IonResult<()> {
       writer.with_annotations(&["foo", "bar])?
          .write(1)?
          .write(2) // Wait, what?
    }
}

We could define behaviors for this type of thing, but I like that the API statically enforces the current behavior.

Note that SequenceWriter and StructWriter implementations take &mut self to allow writing many nested values.

/// Closes out the sequence being written. Delimited writers can use this opportunity to emit
/// a sentinel value, and length-prefixed writers can flush any buffered data to the output
/// buffer.
fn close(self) -> IonResult<Self::Resources>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@zslayton zslayton requested a review from popematt April 16, 2024 14:56
@zslayton zslayton merged commit de545f9 into main Apr 18, 2024
32 checks passed
@zslayton zslayton deleted the rm-writer-closures branch December 5, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants