Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies the serde serializer to use the new writer #747

Merged
merged 2 commits into from
Apr 23, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Apr 22, 2024

This PR replaces usages of the old UserWriter with the new ApplicationWriter introduced in #745 (currently under review).

It also modifies the encoding of various serde data types to include type annotations:

Serde type Example Rust Old serialization New serialization
Unit struct
struct Foo;
"Foo"
Foo
Newtype
struct Foo(i32);
1
Foo::1
Tuple struct
struct Foo(i32, i32);
[1, 2]
Foo::[1, 2]
Struct
struct Foo {
  x: i32,
  y: i32                    
}
{
  x: 1,
  y: 2,
}
Foo::{
  x: 1,
  y: 2,
}

Similarly, enum variant kinds now also include the name of the enum type:

Serde type Example Rust Old serialization New serialization
Unit struct variant
enum Foo {
  Bar // No associated data        
}
"Bar"
Foo::Bar
Newtype variant
enum Foo {
  Bar(i32)                          
}
Bar::1
Foo::Bar::1
Tuple variant
enum Foo {
  Bar(i32, i32)                          
}
Bar::[1, 2]
Foo::Bar::[1, 2]
Struct variant
enum Foo {
  Bar {
    x: i32,
    y: i32       
  }                   
}
Bar::{
  x: 1,
  y: 2,
}
Foo::Bar::{
  x: 1,
  y: 2,
}

This clarity comes at the expense of a bit of data size, but macros make it possible to reduce the size again if you're so motivated. The type names are not strictly needed as serde can determine the expected type from context. However, it may be helpful for humans.

Feedback on this tradeoff is welcome!

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@popematt
Copy link
Contributor

The biggest concern I have with the annotations is how it might affect compatibility with other object mappers, such as Jackson. In general, I think the (language-specific) name of the type should be considered an implementation detail rather than part of the data. I think the extra annotations are okay as long as they are optional.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

@@ -13,7 +13,7 @@ use crate::ion_writer::IonWriter;
use crate::raw_symbol_token_ref::{AsRawSymbolTokenRef, RawSymbolTokenRef};
use crate::result::{IonFailure, IonResult};
use crate::types::integer::IntData;
use crate::types::ContainerType;
use crate::types::ParentType;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The ContainerType enum defined in types/mod.rs was renamed to ParentType because it also includes a TopLevel variant. Another ContainerType was also created that is limited to only containers:

enum ContainerType {
  List,
  SExp,
  Struct, 
}

enum ParentType {
  TopLevel,
  List,
  SExp,
  Struct,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's IonDatagram all over again! 😉

self.output,
"{}",
self.whitespace_config.space_between_top_level_values
)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This ended up being redundant with whitespace printing handled by the TextValueWriter_1_0.

Comment on lines +84 to +94
WriteConfigKind::Text(text_config) => {
let whitespace_config = match text_config.text_kind {
TextKind::Compact => &COMPACT_WHITESPACE_CONFIG,
TextKind::Lines => &LINES_WHITESPACE_CONFIG,
TextKind::Pretty => &PRETTY_WHITESPACE_CONFIG,
};
Ok(LazyRawTextWriter_1_0 {
output,
whitespace_config,
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The LazyRawTextWriter_1_0 now takes the configuration's specified TextKind into account.

Comment on lines -128 to +150
ion_type: IonType,
container_type: ContainerType,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ We can now be more precise about the Ion types that this TextContainerWriter_1_0 may be encoding.

has_been_closed: false,
value_delimiter,
trailing_delimiter,
})
}

/// Writes the `indentation` string set in the whitespace config to output `depth` times.
fn write_indentation(&mut self) -> IonResult<()> {
fn write_indentation(&mut self, depth: usize) -> IonResult<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Unlike the TextValueWriter_1_0 implementation of write_indentation, the TextContainerWriter_1_0's implementation takes depth as a parameter so it can pass either self.depth (for the container itself) or self.depth + 1 (for its child values).

Comment on lines +312 to +314
pub struct SeqWriter<V: ValueWriter> {
seq_writer: V::ListWriter,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This type and MapWriter below exist because it's not possible to implement a trait (in this case: ser::SerializeSeq and ser::SerializeMap) for a generic like V::ListWriter because it violates the orphan rules.

These shim types are locally defined, which means that we're not also implementing the trait for other V: ValueWriter types we don't own. (Though ValueWriter is effectively sealed, sealed traits aren't an official feature and so can't be considered when resolving the orphan rules.)

src/text/raw_text_writer.rs Show resolved Hide resolved
SExpression => "",
SExp => "",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This variant was renamed for consistency with IonType::SExp.

@@ -679,10 +679,10 @@ impl<W: Write> IonWriter for RawTextWriter<W> {

fn parent_type(&self) -> Option<IonType> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The RawTextWriter will be removed in a near-future PR, so I'm not worried about changing this Option<IonType> to a ParentType.

Comment on lines +143 to +156
#[derive(Debug, PartialEq, Default, Copy, Clone)]
pub(crate) enum ScalarType {
#[default]
Null,
Bool,
Int,
Float,
Decimal,
Timestamp,
Symbol,
String,
Clob,
Blob,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This type was included for completeness, but does not currently have a use case.

@zslayton zslayton requested review from desaikd and popematt April 22, 2024 15:06
@zslayton
Copy link
Contributor Author

zslayton commented Apr 22, 2024

The biggest concern I have with the annotations is how it might affect compatibility with other object mappers, such as Jackson. In general, I think the (language-specific) name of the type should be considered an implementation detail rather than part of the data. I think the extra annotations are okay as long as they are optional.

That's a great point. Looking through the serde configuration attributes, it looks like it's easy to change the type name to something else but it isn't possible to omit it altogether. We could modify the ValueSerializer to take a write_type_name: bool as a configuration option, but maybe it isn't worth the effort.

The enum variant names (the Bar in Foo::Bar::1) must always be included, but users can leverage existing serde renaming features to change them as desired.

@desaikd, @nirosys -- what do you think?

@zslayton zslayton marked this pull request as ready for review April 22, 2024 15:11
@zslayton zslayton requested a review from nirosys April 22, 2024 15:11
@popematt
Copy link
Contributor

The enum variant names (the Bar in Foo::Bar::1) must always be included, but users can leverage existing serde renaming features to change them as desired.

Yes. The variant name is unavoidable, but it's also expected by the users.

@@ -13,7 +13,7 @@ use crate::ion_writer::IonWriter;
use crate::raw_symbol_token_ref::{AsRawSymbolTokenRef, RawSymbolTokenRef};
use crate::result::{IonFailure, IonResult};
use crate::types::integer::IntData;
use crate::types::ContainerType;
use crate::types::ParentType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's IonDatagram all over again! 😉

Comment on lines -53 to +51
self.whitespace_config.space_between_top_level_values,
"", // No delimiter between values at the top level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something here. I don't see how the other changes allow us to have no delimiter between top level values here. I also don't know why the parent type here is TopLevel. I thought this is the top level writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing something here. I don't see how the other changes allow us to have no delimiter between top level values here.

This code was adding a single space between top level values, but the parameter was actually expecting the caller to provide a delimiter between values -- , for list/struct, or the empty string for sexp/top-level. Doing both resulted in too much whitespace in the output; not invalid, but not we wanted.

I also don't know why the parent type here is TopLevel.

Each parent type (top level, list, sexp, struct) can construct a *ValueWriter that will encode the provided value into that context. Because this value writer is being created by the LazyRawTextWriter_1_0 (and not by a container writer), it is going to write the value it encodes into the top level. Thus, the value's parent is the top level.

let mut annotations = self.value.annotations();
let first_annotation = annotations.next().transpose()?;
let second_annotation = annotations.next().transpose()?;
match (first_annotation, second_annotation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, it will fail if you include only the enum variant annotation. While I could be convinced that it's okay to always write the type annotation, I don't think we should require a type annotation on read because it makes this harder to use it with object mappers in other language ecosystems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll take your original suggestion and remove the type annotations that I added in this PR. I'm going to do that after my next PR lands though.

src/text/raw_text_writer.rs Show resolved Hide resolved
@zslayton
Copy link
Contributor Author

Merging this with the intent to roll back the type annotations after #748 lands.

@zslayton zslayton merged commit d22aa20 into app-writer Apr 23, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants