Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provisional metadata ser in named field pos #544

Merged
merged 20 commits into from
Sep 25, 2024
Merged

Conversation

voidentente
Copy link
Contributor

Motive

It would be useful to be able to serialize additional metadata, i.e. data that is not scoped to the data representation itself (specifically, comments/documentation regarding the implications of a field/its possible values/valid range/etc).

Combined with crates like documented, this would (for example) allow for auto-documented configuration files.

This PR aims to provide a very lightweight (provisional) support for serializing metadata in named field position.

Serialization

Metadata should be provided to the serializer through PrettyConfig, since it's targeted at human readability.

The only relevant serialization call is <Compound as ser::SerializeStruct>::serialize_struct.

Why not in type position?

Distinguishing field position metadata from type position metadata is awkward. It's possible, but awkward:

/// type position
(
    /// field position
    ident: 
    /// type position??
    (
        /// field position
        a: (),
        b: (),
    )
)

Why not in unit structs?

Unit structs are values:

(
    /// field position
    unit: 
    /// type position?
    (
        // no fields, since it's unit..
    ),
)

Without type position, since there's no fields, there's nothing to do.

Why not in tuple structs?

Tuple structs have unnamed fields, which would make it appear as if values had metadata:

(
    /// field position
    tuple: (
        /// field position..?
        0,
        "",
    )
)

Why not in newtype structs?

Same as with tuple structs.

Why not in variants?

Variants are values, and suffer from the same problem as type position:

(
    /// field position
    variant: 
    /// type position?
    /// variant position??
    Some(13)
)

Deserialization

Metadata is skipped during deserialization. This is given by using the comment syntax.

Open Questions

  • Where should the metadata live during serialization? In Serializer? PrettyConfig? Should it be skipped when serializing PrettyConfig?

  • What to use to mark metadata? To keep backwards compatibility, either // or /// are suitable, with /// additionally offering forwards compatibility to distinguish metadata from plain comments.

  • How to avoid overlaps? If two structs have a field that share a name, there’s no way to differentiate them. Additional data would have to be tracked to distinguish these namespaces. Because this might be surprising, I'd say this should stay a draft until this has a solution.

    Example of an overlap:

    mod inner {
        pub struct Definition {
            a: usize,
        }
    }
    
    struct Definition {
        a: usize,
        inner: inner::Definition,
    }

Example Usage

use ron::ser::PrettyConfig;
use serde_derive::{Deserialize, Serialize};

fn main() {
    #[derive(Serialize, Deserialize)]
    pub enum C {
        Variant,
    }

    #[derive(Serialize, Deserialize)]
    pub struct B {
        pub c: C,
    }

    #[derive(Serialize, Deserialize)]
    struct A {
        a: usize,
        b: B,
    }

    let value = A {
        a: 0,
        b: B { c: C::Variant },
    };

    let mut config = PrettyConfig::default();

    config.meta.insert("a", "this is paired with a");
    config.meta.insert("b", "this is paired with b");
    config.meta.insert("c", "this is paired with c");

    let s = ron::ser::to_string_pretty(&value, config).unwrap();

    println!("{s}");

    assert!(ron::de::from_str::<A>(&s).is_ok());
}
(
    /// this is paired with a
    a: 0,
    /// this is paired with b
    b: (
        /// this is paired with c
        c: Variant,
    ),
)

@voidentente
Copy link
Contributor Author

Regarding the collision problem: Binding the meta to the struct name as well and not just field name significantly reduces the risk of collision, but does not eliminate it. A { a } would still be indistinguishable from inner::A { a }.

Since the type isn't accessible from serialization however, the only way to fully eliminate the collision problem is by not relying on type binding in the first place. One alternative might be to supply the meta as a structured hierarchy.

This requires the serializer to keep track of which field(s) are currently entered.
My API currently looks something like this:

fn main() {
    #[derive(Serialize)]
    struct A {
        a: usize,
        b: inner::A,
        c: usize,
    }

    mod inner {
        use serde_derive::Serialize;
        #[derive(Serialize)]
        pub struct A {
            pub a: usize,
        }
    }

    let mut config = PrettyConfig::default();

    {
        let meta = &mut config.meta;

        {
            let field = meta.field_mut_or_default("a");
            field.set_meta("field a");
        }

        {
            let field = meta.field_mut_or_default("b");
            field.set_meta("field b");

            field.set_inner({
                let mut fields = Fields::new();
                fields.field_mut_or_default("a").set_meta("inner field a");
                fields
            });
        }

        {
            let field = meta.field_mut_or_default("c");
            field.set_meta("field c");
        }
    }

    let value = A {
        a: 0,
        b: inner::A { a: 0 },
        c: 3,
    };

    let s = ron::ser::to_string_pretty(&value, config).unwrap();

    println!("{s}");
}
(
    /// field a
    a: 0,
    /// field b
    b: (
        /// inner field a
        a: 0,
    ),
    /// field c
    c: 3,
)

@voidentente
Copy link
Contributor Author

Alright, I think that should clear up the collision problem. Support for more positions can be added later seamlessly. Going with PrettyConfig and /// seems like a reasonable choice to me. I'll un-draft this because the impact of this PR on the crate should be rather minimal

@voidentente voidentente marked this pull request as ready for review August 6, 2024 01:43
@juntyr
Copy link
Member

juntyr commented Aug 6, 2024

Thank you @voidentente for your PR and the extensive motivation!

I absolutely agree with the motivation for adding attributes in general and doc comments as a first step towards them.

What I'm unsure about is the focus on type-based docs instead of value-based docs. As you very thoroughly lay out, there are almost no uniquely type-based places for docs to appear, with struct-like fields being the exception, most other places feel more like they would be used for value-docs.

Furthermore, the API for type-based docs feels a bit brittle as there is a possibility for name clashes or the structure of the type names needs to be separately encoded again.

What I would prefer is a value-based docs (later generalised to attributes) API that is more general and can be utilised to then build a type-based API on top of it (but probably not inside RON but another crate). In particular, I would suggest that stylistically, doc comments generally go on the line above the value and we add a special case for fields so that they are serialised as follows:

(
    /// my value comment
    a: 42,
)

What I think might work well is an API similar to serde_spanned (which I've wanted to support in RON for a while but haven't gotten to), that is:

  • a new separate serde_meta or serde_value_attrs crate (I don't think one exists already) that publicly exports a type pub struct Meta<'a, T> { meta: Cow<'a, [(Cow<'a, str>, Cow<'a, str>)]>, value: T } and hidden-exports some helper functions to identify when this type is being serialised and deserialised (taking serde_spanned for heavy inspiration)
  • add serialising support such that ("doc", "my-doc") attributes are serialised as /// my-doc (and error for any other attribute keys right now)
  • add deserialising support such that doc comments are parsed, usually ignored, but provided when a struct that matches Meta is deserialised

Types that would always like to serialise the same doc comments (type-based) could then update their serialisation code to always serialise the same comment. Your proposed API of taking an existing data structure and serialising it with separate type comments could be built by wrapping the serialiser and injecting metadata structs whenever a matching struct is serialised.

What are your thoughts?

@voidentente
Copy link
Contributor Author

Well, I expressly wanted to keep the extent of this PR as scoped as possible. It's based entirely on top of the comment syntax, because I'm unsure of the idiomacy of deserialization of metadata. If metadata is capable of deserialization, it wouldn't be just for human readability. At that point, the difference between metadata and normal data becomes muddy.

You bring up other attribute kinds, which would require a new syntax and thus support by the deserializer. I'm hesitant about this, because it would be a breaking change (unless some piggybacking happens, like //kind/ value or similar).

The primary motivation (at least for me here) is serializing documentation. There's no need to be capable of deserializing metadata in order to update it; the user should be able to dictate what metadata to put where regardless of the previous state of the document.

Since this PR should be entirely non-breaking, it'd be a first good step to adding support. It wouldn't be forwards-compatible with an extended format, but that's fine if this part of the API is marked as unstable. I'm not knowledgeable enough about the serde ecosystem to transform this into a fully-featured, integrated, stable API from the get-go.

What I'm unsure about is the focus on type-based docs instead of value-based docs. As you very thoroughly lay out, there are almost no uniquely type-based places for docs to appear, with struct-like fields being the exception, most other places feel more like they would be used for value-docs.

Furthermore, the API for type-based docs feels a bit brittle as there is a possibility for name clashes or the structure of the type names needs to be separately encoded again.

Type position is usually a subset of field position, which is why I chose to neglect it. Serializing in a way that differentiates it between field position metadata would hurt readability.

/// type position
Type (
    /// field position
    a: 
    /// type position
    0,
    /// field position
    opt: 
    /// type position
    Some(13),
    /// field position
    inner: 
    /// type position
    Other {
        /// field position
        value: 
        /// type position
        (),
    }
)

The collision problem should be entirely resolved by using hierarchy. The serializer internally keeps track of which field is entered, allowing to differentiate fields with the same name based on context.

The (field position) metadata for the above RON is currently expressed like this:

["a"]              = "field position"
["opt"]            = "field position"
["inner"]          = ..
["inner"]["value"] = ..

@juntyr
Copy link
Member

juntyr commented Aug 7, 2024

Well, I expressly wanted to keep the extent of this PR as scoped as possible. It's based entirely on top of the comment syntax, because I'm unsure of the idiomacy of deserialization of metadata. If metadata is capable of deserialization, it wouldn't be just for human readability. At that point, the difference between metadata and normal data becomes muddy.

I appreciate the minimal initial approach and the focus on doc comments as a first step! And it’s true that treating metadata as data in the serde model might not be suited for all use cases, where your proposed more ad-hoc API would work very well.

Your proposed API is thus growing on me and I think it could be supported long-term, even if it’s internal implementation might at some point switch to a proper attribute system.

You bring up other attribute kinds, which would require a new syntax and thus support by the deserializer. I'm hesitant about this, because it would be a breaking change (unless some piggybacking happens, like //kind/ value or similar).

Ron already has attributes, though they’re currently only used to enable features and are only allowed at the top of the document. This could simply be expanded, and just like in Rust, doc comments would be supported as /// and #[doc = “”].

@juntyr
Copy link
Member

juntyr commented Aug 7, 2024

What this PR would still need is tests to ensure full coverage and to test how field docs can be added e.g. to a struct nested in an enum inside a vec

@voidentente
Copy link
Contributor Author

Ron already has attributes, though they’re currently only used to enable features and are only allowed at the top of the document.

Oh, I didn't know about that! I'll stick with the current plan now, but extending the deserializer later might be worth looking into.

I'll add some tests, let me know of any API nitpicks if you have them :)

@voidentente
Copy link
Contributor Author

voidentente commented Aug 7, 2024

how field docs can be added e.g. to a struct nested in an enum inside a vec

TL;DR: The hierarchy is not absolute, but relative to entered named fields. The test file asserts that addressing fields of a struct nested in an enum inside a vec is structurally equivalent to addressing a struct.

@juntyr
Copy link
Member

juntyr commented Aug 8, 2024

API-wise, I think the meta field on PrettyConfig should be of an new Meta type, which then exposes a fields() method, so that we could add support for different accessors later.

Could there be a test for a tuple (Person, Pet) and how the name clashes are handled there?

@voidentente
Copy link
Contributor Author

@juntyr Any more thoughts on this?

src/ser/mod.rs Outdated Show resolved Hide resolved
src/meta.rs Outdated Show resolved Hide resolved
src/meta.rs Outdated Show resolved Hide resolved
@juntyr
Copy link
Member

juntyr commented Sep 22, 2024

Sorry for the delay in the review, things have been hard

@voidentente
Copy link
Contributor Author

The path traversal should now be constant time. I'm not super happy with the public API, but it's prone to change anyway.

src/ser/mod.rs Outdated Show resolved Hide resolved
@voidentente
Copy link
Contributor Author

I'll also throw in that Fields currently uses SipHash, which might be overkill. We could get a minor performance increase with AHash.

src/meta.rs Outdated Show resolved Hide resolved
src/ser/mod.rs Outdated Show resolved Hide resolved
src/ser/mod.rs Outdated Show resolved Hide resolved
@juntyr
Copy link
Member

juntyr commented Sep 24, 2024

Thank you @voidentente for your continuous work on this PR - just a few more nits and I'm happy to land this

@voidentente
Copy link
Contributor Author

I moved the PR to the ser module just to make it more obvious that it only performs serialization, and renamed it and the test file to path_meta. PrettyConfig now contains it directly, which should be more intuitive.

Copy link
Member

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have two minor nits, but everything else looks great :)

src/ser/mod.rs Show resolved Hide resolved
src/ser/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated Show resolved Hide resolved
src/ser/mod.rs Outdated Show resolved Hide resolved
@juntyr juntyr merged commit ea6b406 into ron-rs:master Sep 25, 2024
9 checks passed
@juntyr
Copy link
Member

juntyr commented Sep 25, 2024

Thank you @voidentente for your work on this feature!

@voidentente voidentente deleted the meta branch September 25, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants