Add provisional metadata ser in named field pos #544

voidentente · 2024-08-05T20:27:53Z

Motive

It would be useful to be able to serialize additional metadata, i.e. data that is not scoped to the data representation itself (specifically, comments/documentation regarding the implications of a field/its possible values/valid range/etc).

Combined with crates like documented, this would (for example) allow for auto-documented configuration files.

This PR aims to provide a very lightweight (provisional) support for serializing metadata in named field position.

Serialization

Metadata should be provided to the serializer through PrettyConfig, since it's targeted at human readability.

The only relevant serialization call is <Compound as ser::SerializeStruct>::serialize_struct.

Why not in type position?

Distinguishing field position metadata from type position metadata is awkward. It's possible, but awkward:

/// type position
(
    /// field position
    ident: 
    /// type position??
    (
        /// field position
        a: (),
        b: (),
    )
)

Why not in unit structs?

Unit structs are values:

(
    /// field position
    unit: 
    /// type position?
    (
        // no fields, since it's unit..
    ),
)

Without type position, since there's no fields, there's nothing to do.

Why not in tuple structs?

Tuple structs have unnamed fields, which would make it appear as if values had metadata:

(
    /// field position
    tuple: (
        /// field position..?
        0,
        "",
    )
)

Why not in newtype structs?

Same as with tuple structs.

Why not in variants?

Variants are values, and suffer from the same problem as type position:

(
    /// field position
    variant: 
    /// type position?
    /// variant position??
    Some(13)
)

Deserialization

Metadata is skipped during deserialization. This is given by using the comment syntax.

Open Questions

Where should the metadata live during serialization? In Serializer? PrettyConfig? Should it be skipped when serializing PrettyConfig?
What to use to mark metadata? To keep backwards compatibility, either // or /// are suitable, with /// additionally offering forwards compatibility to distinguish metadata from plain comments.
How to avoid overlaps? If two structs have a field that share a name, there’s no way to differentiate them. Additional data would have to be tracked to distinguish these namespaces. Because this might be surprising, I'd say this should stay a draft until this has a solution.

Example of an overlap:
```
mod inner {
    pub struct Definition {
        a: usize,
    }
}

struct Definition {
    a: usize,
    inner: inner::Definition,
}
```

Example Usage

use ron::ser::PrettyConfig;
use serde_derive::{Deserialize, Serialize};

fn main() {
    #[derive(Serialize, Deserialize)]
    pub enum C {
        Variant,
    }

    #[derive(Serialize, Deserialize)]
    pub struct B {
        pub c: C,
    }

    #[derive(Serialize, Deserialize)]
    struct A {
        a: usize,
        b: B,
    }

    let value = A {
        a: 0,
        b: B { c: C::Variant },
    };

    let mut config = PrettyConfig::default();

    config.meta.insert("a", "this is paired with a");
    config.meta.insert("b", "this is paired with b");
    config.meta.insert("c", "this is paired with c");

    let s = ron::ser::to_string_pretty(&value, config).unwrap();

    println!("{s}");

    assert!(ron::de::from_str::<A>(&s).is_ok());
}

(
    /// this is paired with a
    a: 0,
    /// this is paired with b
    b: (
        /// this is paired with c
        c: Variant,
    ),
)

voidentente · 2024-08-05T23:15:46Z

Regarding the collision problem: Binding the meta to the struct name as well and not just field name significantly reduces the risk of collision, but does not eliminate it. A { a } would still be indistinguishable from inner::A { a }.

Since the type isn't accessible from serialization however, the only way to fully eliminate the collision problem is by not relying on type binding in the first place. One alternative might be to supply the meta as a structured hierarchy.

This requires the serializer to keep track of which field(s) are currently entered.
My API currently looks something like this:

fn main() {
    #[derive(Serialize)]
    struct A {
        a: usize,
        b: inner::A,
        c: usize,
    }

    mod inner {
        use serde_derive::Serialize;
        #[derive(Serialize)]
        pub struct A {
            pub a: usize,
        }
    }

    let mut config = PrettyConfig::default();

    {
        let meta = &mut config.meta;

        {
            let field = meta.field_mut_or_default("a");
            field.set_meta("field a");
        }

        {
            let field = meta.field_mut_or_default("b");
            field.set_meta("field b");

            field.set_inner({
                let mut fields = Fields::new();
                fields.field_mut_or_default("a").set_meta("inner field a");
                fields
            });
        }

        {
            let field = meta.field_mut_or_default("c");
            field.set_meta("field c");
        }
    }

    let value = A {
        a: 0,
        b: inner::A { a: 0 },
        c: 3,
    };

    let s = ron::ser::to_string_pretty(&value, config).unwrap();

    println!("{s}");
}

(
    /// field a
    a: 0,
    /// field b
    b: (
        /// inner field a
        a: 0,
    ),
    /// field c
    c: 3,
)

voidentente · 2024-08-06T01:43:05Z

Alright, I think that should clear up the collision problem. Support for more positions can be added later seamlessly. Going with PrettyConfig and /// seems like a reasonable choice to me. I'll un-draft this because the impact of this PR on the crate should be rather minimal

juntyr · 2024-08-06T06:26:12Z

Thank you @voidentente for your PR and the extensive motivation!

I absolutely agree with the motivation for adding attributes in general and doc comments as a first step towards them.

What I'm unsure about is the focus on type-based docs instead of value-based docs. As you very thoroughly lay out, there are almost no uniquely type-based places for docs to appear, with struct-like fields being the exception, most other places feel more like they would be used for value-docs.

Furthermore, the API for type-based docs feels a bit brittle as there is a possibility for name clashes or the structure of the type names needs to be separately encoded again.

What I would prefer is a value-based docs (later generalised to attributes) API that is more general and can be utilised to then build a type-based API on top of it (but probably not inside RON but another crate). In particular, I would suggest that stylistically, doc comments generally go on the line above the value and we add a special case for fields so that they are serialised as follows:

(
    /// my value comment
    a: 42,
)

What I think might work well is an API similar to serde_spanned (which I've wanted to support in RON for a while but haven't gotten to), that is:

a new separate serde_meta or serde_value_attrs crate (I don't think one exists already) that publicly exports a type pub struct Meta<'a, T> { meta: Cow<'a, [(Cow<'a, str>, Cow<'a, str>)]>, value: T } and hidden-exports some helper functions to identify when this type is being serialised and deserialised (taking serde_spanned for heavy inspiration)
add serialising support such that ("doc", "my-doc") attributes are serialised as /// my-doc (and error for any other attribute keys right now)
add deserialising support such that doc comments are parsed, usually ignored, but provided when a struct that matches Meta is deserialised

Types that would always like to serialise the same doc comments (type-based) could then update their serialisation code to always serialise the same comment. Your proposed API of taking an existing data structure and serialising it with separate type comments could be built by wrapping the serialiser and injecting metadata structs whenever a matching struct is serialised.

What are your thoughts?

voidentente · 2024-08-06T12:01:08Z

Well, I expressly wanted to keep the extent of this PR as scoped as possible. It's based entirely on top of the comment syntax, because I'm unsure of the idiomacy of deserialization of metadata. If metadata is capable of deserialization, it wouldn't be just for human readability. At that point, the difference between metadata and normal data becomes muddy.

You bring up other attribute kinds, which would require a new syntax and thus support by the deserializer. I'm hesitant about this, because it would be a breaking change (unless some piggybacking happens, like //kind/ value or similar).

The primary motivation (at least for me here) is serializing documentation. There's no need to be capable of deserializing metadata in order to update it; the user should be able to dictate what metadata to put where regardless of the previous state of the document.

Since this PR should be entirely non-breaking, it'd be a first good step to adding support. It wouldn't be forwards-compatible with an extended format, but that's fine if this part of the API is marked as unstable. I'm not knowledgeable enough about the serde ecosystem to transform this into a fully-featured, integrated, stable API from the get-go.

What I'm unsure about is the focus on type-based docs instead of value-based docs. As you very thoroughly lay out, there are almost no uniquely type-based places for docs to appear, with struct-like fields being the exception, most other places feel more like they would be used for value-docs.

Furthermore, the API for type-based docs feels a bit brittle as there is a possibility for name clashes or the structure of the type names needs to be separately encoded again.

Type position is usually a subset of field position, which is why I chose to neglect it. Serializing in a way that differentiates it between field position metadata would hurt readability.

/// type position
Type (
    /// field position
    a: 
    /// type position
    0,
    /// field position
    opt: 
    /// type position
    Some(13),
    /// field position
    inner: 
    /// type position
    Other {
        /// field position
        value: 
        /// type position
        (),
    }
)

The collision problem should be entirely resolved by using hierarchy. The serializer internally keeps track of which field is entered, allowing to differentiate fields with the same name based on context.

The (field position) metadata for the above RON is currently expressed like this:

["a"]              = "field position"
["opt"]            = "field position"
["inner"]          = ..
["inner"]["value"] = ..

juntyr · 2024-08-07T05:23:45Z

Well, I expressly wanted to keep the extent of this PR as scoped as possible. It's based entirely on top of the comment syntax, because I'm unsure of the idiomacy of deserialization of metadata. If metadata is capable of deserialization, it wouldn't be just for human readability. At that point, the difference between metadata and normal data becomes muddy.

I appreciate the minimal initial approach and the focus on doc comments as a first step! And it’s true that treating metadata as data in the serde model might not be suited for all use cases, where your proposed more ad-hoc API would work very well.

Your proposed API is thus growing on me and I think it could be supported long-term, even if it’s internal implementation might at some point switch to a proper attribute system.

You bring up other attribute kinds, which would require a new syntax and thus support by the deserializer. I'm hesitant about this, because it would be a breaking change (unless some piggybacking happens, like //kind/ value or similar).

Ron already has attributes, though they’re currently only used to enable features and are only allowed at the top of the document. This could simply be expanded, and just like in Rust, doc comments would be supported as /// and #[doc = “”].

juntyr · 2024-08-07T06:52:30Z

What this PR would still need is tests to ensure full coverage and to test how field docs can be added e.g. to a struct nested in an enum inside a vec

voidentente · 2024-08-07T13:08:26Z

Ron already has attributes, though they’re currently only used to enable features and are only allowed at the top of the document.

Oh, I didn't know about that! I'll stick with the current plan now, but extending the deserializer later might be worth looking into.

I'll add some tests, let me know of any API nitpicks if you have them :)

voidentente · 2024-08-07T15:03:53Z

how field docs can be added e.g. to a struct nested in an enum inside a vec

TL;DR: The hierarchy is not absolute, but relative to entered named fields. The test file asserts that addressing fields of a struct nested in an enum inside a vec is structurally equivalent to addressing a struct.

juntyr · 2024-08-08T06:13:17Z

API-wise, I think the meta field on PrettyConfig should be of an new Meta type, which then exposes a fields() method, so that we could add support for different accessors later.

Could there be a test for a tuple (Person, Pet) and how the name clashes are handled there?

voidentente · 2024-09-21T01:54:20Z

@juntyr Any more thoughts on this?

src/ser/mod.rs

src/meta.rs

juntyr · 2024-09-22T19:32:09Z

Sorry for the delay in the review, things have been hard

voidentente · 2024-09-24T11:58:25Z

The path traversal should now be constant time. I'm not super happy with the public API, but it's prone to change anyway.

src/ser/mod.rs

voidentente · 2024-09-24T12:07:33Z

I'll also throw in that Fields currently uses SipHash, which might be overkill. We could get a minor performance increase with AHash.

src/meta.rs

src/ser/mod.rs

juntyr · 2024-09-24T15:18:09Z

Thank you @voidentente for your continuous work on this PR - just a few more nits and I'm happy to land this

voidentente · 2024-09-24T15:49:13Z

I moved the PR to the ser module just to make it more obvious that it only performs serialization, and renamed it and the test file to path_meta. PrettyConfig now contains it directly, which should be more intuitive.

juntyr

I just have two minor nits, but everything else looks great :)

src/ser/mod.rs

juntyr

LGTM

CHANGELOG.md

src/ser/mod.rs

juntyr · 2024-09-25T09:57:09Z

Thank you @voidentente for your work on this feature!

Add provisional metadata ser in named field pos

5c88c47

voidentente mentioned this pull request Aug 5, 2024

Support (basic) attributes #435

Closed

voidentente added 3 commits August 6, 2024 01:32

Use hierarchy to fix collision; fix multiline indent bug

5ed4636

Add CHANGELOG.md entry

4edf360

Touch up on API, add some docs, fix inline call to indent

1664ad1

voidentente marked this pull request as ready for review August 6, 2024 01:43

voidentente added 3 commits August 6, 2024 14:21

Change let .. else to iterator

709ff5e

Add build_fields as ergonomic shortcut

191dda6

Remove expect, use collect, add must_use

e586fee

voidentente added 2 commits August 7, 2024 16:51

Add test file with named field struct hierarchy test

5cc4785

Touch up on API, add doctests

84a4c39

voidentente added 3 commits August 7, 2024 17:12

Fix doctests

6191087

Add meta with newline to test file

6420d2d

Apply clippy lints

54858f0

voidentente and others added 2 commits August 8, 2024 15:04

Remove is_some_and, wrap meta, add tuple to test

1bf4d06

Merge branch 'master' into meta

db027f3

juntyr requested changes Sep 22, 2024

View reviewed changes

src/ser/mod.rs Outdated Show resolved Hide resolved

src/meta.rs Outdated Show resolved Hide resolved

src/meta.rs Outdated Show resolved Hide resolved

Apply change request

dd036ed

voidentente commented Sep 24, 2024

View reviewed changes

src/ser/mod.rs Outdated Show resolved Hide resolved

juntyr requested changes Sep 24, 2024

View reviewed changes

src/meta.rs Outdated Show resolved Hide resolved

src/ser/mod.rs Outdated Show resolved Hide resolved

src/ser/mod.rs Outdated Show resolved Hide resolved

Refactor

4e448e9

Cleanup

f664e78

juntyr approved these changes Sep 25, 2024

View reviewed changes

src/ser/mod.rs Show resolved Hide resolved

src/ser/mod.rs Outdated Show resolved Hide resolved

Make indent take mutable reference, make method a wrapper

988fdb4

juntyr approved these changes Sep 25, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

src/ser/mod.rs Outdated Show resolved Hide resolved

juntyr added 2 commits September 25, 2024 12:44

Update CHANGELOG.md

7b909d0

Update src/ser/mod.rs

cad42d1

juntyr merged commit ea6b406 into ron-rs:master Sep 25, 2024
9 checks passed

voidentente deleted the meta branch September 25, 2024 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add provisional metadata ser in named field pos #544

Add provisional metadata ser in named field pos #544

voidentente commented Aug 5, 2024

voidentente commented Aug 5, 2024

voidentente commented Aug 6, 2024

juntyr commented Aug 6, 2024

voidentente commented Aug 6, 2024

juntyr commented Aug 7, 2024

juntyr commented Aug 7, 2024

voidentente commented Aug 7, 2024

voidentente commented Aug 7, 2024 •

edited

Loading

juntyr commented Aug 8, 2024

voidentente commented Sep 21, 2024

juntyr commented Sep 22, 2024

voidentente commented Sep 24, 2024

voidentente commented Sep 24, 2024

juntyr commented Sep 24, 2024

voidentente commented Sep 24, 2024

juntyr left a comment

juntyr left a comment

juntyr commented Sep 25, 2024

Add provisional metadata ser in named field pos #544

Add provisional metadata ser in named field pos #544

Conversation

voidentente commented Aug 5, 2024

Motive

Serialization

Why not in type position?

Why not in unit structs?

Why not in tuple structs?

Why not in newtype structs?

Why not in variants?

Deserialization

Open Questions

Example Usage

voidentente commented Aug 5, 2024

voidentente commented Aug 6, 2024

juntyr commented Aug 6, 2024

voidentente commented Aug 6, 2024

juntyr commented Aug 7, 2024

juntyr commented Aug 7, 2024

voidentente commented Aug 7, 2024

voidentente commented Aug 7, 2024 • edited Loading

juntyr commented Aug 8, 2024

voidentente commented Sep 21, 2024

juntyr commented Sep 22, 2024

voidentente commented Sep 24, 2024

voidentente commented Sep 24, 2024

juntyr commented Sep 24, 2024

voidentente commented Sep 24, 2024

juntyr left a comment

Choose a reason for hiding this comment

juntyr left a comment

Choose a reason for hiding this comment

juntyr commented Sep 25, 2024

voidentente commented Aug 7, 2024 •

edited

Loading