Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$id updates #1537

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

$id updates #1537

wants to merge 13 commits into from

Conversation

gregsdennis
Copy link
Member

What kind of change does this PR introduce?

Cleans up some language around $id.

Issue & Discussion References

Summary

Just some clean-up and clarification that we've previously identified.

Schema document roots SHOULD have an $id (maybe relax this? ...) - from #1444 (comment)

I decided to leave this in for now. We can discuss and change separately if needed.

Does this PR introduce a breaking change?

No

@gregsdennis gregsdennis requested a review from a team October 1, 2024 00:33
Copy link
Member

@jviotti jviotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Most of the changes are simple one-character edits, and the rest read very well to me

Due to the potential break in functionality described above, the behavior for
using JSON Pointer fragments that point to or cross a resource boundary is
undefined. Schema authors SHOULD NOT rely on such IRIs, as using them may
reduce interoperability.
Copy link
Contributor

@notEthan notEthan Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating here that this behavior is undefined seems to me to contradict the above "an embedded schema resource and its subschemas can be identified by JSON Pointer fragments relative to either its own canonical IRI, or relative to any containing resource's IRI".

This also leaves [^8] with no reference to it.

And, minor, but two spaces after a period isn't consistent with the rest of the document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also leaves [^8] with no reference to it.

I'll add that back in.

Copy link
Member Author

@gregsdennis gregsdennis Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed that we need to make pointers that cross boundaries an undefined behavior rather than explicitly allowing implementations to support it. Of course, implementations can still support it if they want, but the spec should push schema authors a bit more forcefully toward using the proper $id instead.

I think the critical point here is "an embedded schema resource and its subschemas". It's arguable that an embedded schema resource isn't strictly a subschema since it can be refactored out without any change in behavior. If we need to clarify that, I'm happy to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I get making this undefined is the main point of this PR (I edited my comment a bit to clarify that, had it slightly wrong initially ... I'm not sure this should be undefined but if that is the way it is to be, okay).

To me "an embedded schema resource ... can be identified by JSON Pointer fragments ... relative to any containing resource's IRI" reads as defined behavior and not consistent with making it undefined some paragraphs later, and needs some change. I see your point on ambiguity whether "subschema" includes embedded schemas but I don't see language present that actually calls embedded schemas subschemas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read that paragraph as informative, not prescriptive. It's setting up the reasoning behind the requirement.

Embedded schemas can be referenced this way, but because reasons it's a bad idea to actually do it, so we're gonna say probably don't. We're also letting implementations decide whether they want to support it, no pressure either way.

jsonschema-core.md Outdated Show resolved Hide resolved
@gregsdennis gregsdennis added this to the stable-release milestone Oct 2, 2024
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling these issues!

Comment on lines 914 to 917
If present, the value for this keyword MUST be a string, and MUST represent a
valid [IRI-reference](#rfc3987). This IRI-reference SHOULD be normalized, and
MUST resolve to an [absolute-IRI](#rfc3987) (without a fragment).
valid [IRI reference](#rfc3987). This IRI reference SHOULD be normalized per RFC
3987, section 5.3, and MUST resolve to an [absolute IRI](#rfc3987) (without a
fragment).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the normalization requirement is a requirement on the schema author and should be avoided. I think this should be a requirement on an implementation's internal schema registry. The implementation should accept an IRI that isn't normalized, but SHOULD normalize as needed to retrieve schemas.

So, I like the improvement to the wording, but I think this whole requirement needs to go elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a think about this. I don't disagree, but I'll need to figure out where to put such a requirement. This section does warrant a note to authors that implementations will be performing this normalization; probably a link to wherever this ends up.

Copy link
Member Author

@gregsdennis gregsdennis Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it in place and changing to

If present, the value for this keyword MUST be a string, and MUST represent a valid IRI reference. When processing this IRI reference, implementations SHOULD normalize it per RFC 3987, section 5.3, so that it resolves to an absolute IRI (without a fragment).

seems to work. It puts the requirement on the implementation, but it also lets the user (if they're even reading this) know what they can expect from the implementation.

I do fear, though, that this gives license to users to put IRIs with fragments in $id knowing that the implementation will simply ignore it. (It's almost like the output needs the ability to report a warning as well as an error. "Hey, uh, there's a fragment on that IRI, guy. You sure you want to do that? Probably don't.") I think we do have or planned to have some language that disallowed fragments in $id, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think leaving it in place works because $id isn't the only way a schema might be assigned an IRI. For example, implementations often also allow users to assign IRIs to schemas when they're registered. The normalization requirement should also apply to those user provided IRIs and therefore needs to be somewhere where it doesn't only apply to $id.

One place I think it could work is in "Base IRI, Anchors, and Dereferencing" above. Here's a suggestion.

To differentiate between schemas in a vast ecosystem, schemas are identified by absolute IRIs (without a fragment).

Schemas can embed references to other schemas by specifying their IRI. When implementations dereference these references, they SHOULD use the normalization procedures defined in RFC 3987, section 5.3 when comparing URIs.

Several keywords can accept an IRI reference. The schema's identifier serves as the base IRI for resolving relative references.

Then I'd remove everything about normalization and fragments from this paragraph. If we want, we could include an informal warning that fragments are ignored because schemas are identified by absolute IRIs, but it shouldn't technically be necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need information on valid values for $id, though. I feel that's what at least the first sentence does. I'll play with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was only suggesting removing the part about normalization and fragments. The part about it being an IRI reference needs to stay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the normalization up to the previous section as suggested. I had to play with the wording a bit.

jsonschema-validation.md Outdated Show resolved Hide resolved
jsonschema-validation.md Outdated Show resolved Hide resolved
jsonschema-core.md Show resolved Hide resolved
jsonschema-core.md Outdated Show resolved Hide resolved
jsonschema-core.md Outdated Show resolved Hide resolved
@gregsdennis
Copy link
Member Author

@notEthan @jdesrosiers I've almost completely reworked the $id section. There were some redundancies in it that didn't sit right with me.

I moved the syntactic definition of the keyword up to the top sentence next to its purpose, and I reworked how it talks about defining a resource, needing to be resolved against the current base IRI, and then providing a base IRI for child resources.

This relationship is necessarily recursive (the $id value needs a base IRI but then also serves as a base IRI), so it's quite difficult to state simply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants