Reference tooling discussion and requirements #485
Replies: 17 comments 58 replies
-
Discussion around U1AsyncAPI + JSON Schema >= draft 2019-09 with bundling. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U2AsyncAPI + JSON Schema >= draft 2019-09 with bundling and relative reference. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U3AsyncAPI + JSON Schema < draft 2019-09 with bundling. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U4AsyncAPI + JSON Schema >= draft 2019-09 with bundling and internal definitions. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U5AsyncAPI + JSON Schema contradicting formats. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U6AsyncAPI using OpenAPI Schema documents which expect a base URI. Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U7OpenAPI + JSON Schema contradicting base-URI's Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U8AsyncAPI + non-JSON references Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U9AsyncAPI + JSON Schema contradicting formats for remote references Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
Discussion around U10AsyncAPI + JSON Schema double contradicting formats for remote references Let's try to have all the discussions related to this use case below here to keep the discussion specific to one case at a time. Otherwise, we might get lost very quickly! |
Beta Was this translation helpful? Give feedback.
-
@jonaslagoni I have literally been spending this morning surveying various non-JSON Schema JSON Reference implementations and compiling use cases, so this is great and timely for me! To add a wrinkle: JSON Schema will support direct use of IRIs in the next release (instead of requiring them to be converted to URIs by punycoding, etc.), and I expect IRI usage will increase over time. Of course that does not require anyone to use IRIs that are not valid URIs without conversion, but it's worth keeping in mind. |
Beta Was this translation helpful? Give feedback.
-
Note also that putting JSON Schema annotations adjacent to |
Beta Was this translation helpful? Give feedback.
-
Oh, and if you weren't aware, the forthcoming YAML media type RFC directly addresses JSON compatibility and using JSON Pointer fragments with YAML. |
Beta Was this translation helpful? Give feedback.
-
Can you elaborate a little on use cases for this tool and what useful information you expect it to provide, or functions you expect it to perform? One of the things to remember is that
JSON Schema does not use JSON Reference. It was dropped in draft-05 because it's incompatible with the behavior of URI References, as described in RFC3986, and this is a problem inherent to the idea of a JSON Reference standard. It might seem like you could write a tool that will transparently dereference $ref references, but URI References are just too powerful to be able to reason about without a parser specific to the media type/format in question. In all drafts of JSON Schema with $ref, you cannot generally substitute the reference for the document in JSON Schema, this might change the base URI that a URI Reference in the document is relying on. At the very least you have to add or change the If you describe the specific contexts where this library is being deployed and what answers it's providing for you, I can suggest some better alternatives. |
Beta Was this translation helpful? Give feedback.
-
I feel like referencing use cases break down into three broad categories, and I'm curious if others agree. All of these are defined across a set of documents. Documents that support "Same behavior" in this context means that whatever the underlying document set is supposed to do has the same outcome. For JSON Schemas, this means the same validation outcome, and the same annotations (except that the evaluation path can vary a bit as it shows where Document transformationsTake a document set and produce another document set with the same behavior. This has several variations:
There are probably some others I'm forgetting, but those are the main ones. In-memory data structuresThis involves using lazy proxy objects of some sort so that you can traverse the overall structure in memory as if it were a single big (possibly infinite/cyclic) resource. There are two variations: making the reference totally invisible (JSON Reference complete object replacement model) or making the reference the proxy so you would still see a Caching and serving reference targets"Serving" here might be as simple as "there's a JS object/Python dict/Perl hash/whatever mapping URIs to parsed resources". This is even more useful with
|
Beta Was this translation helpful? Give feedback.
-
Here is another question regarding tooling, interoperability, and base URIs. Since references can be relative, a tool fully supporting RFC 3986 defines a four-step precedence ladder for determining the correct base URI:
And of course, you might have a reason to supply a non-RFC3986-compliant base URI, e.g. because you would normally use the retrieval URI, but in your development environment technically the retrieval URI is Or maybe you're not even faking part of the RFC 3986 source ladder, but you want to supply a totally erroneous base URI to test some sort of error handling. So RFC 3986 normatively governs the determination of the "correct" base URI, regardless of what is or isn't written in some other specification. But as noted above, tooling requirements can include the need to go outside of that process.
So I have a two-part question about how a spec should handle this in terms of normative requirements: Question 1: what normative requirements, regardless of level, should exist?
Question 2: what level of requirement should it be (assuming you chose a normative requirement)
I have had some surprisingly intense debates on this topic, so I thought I'd see what expectations folks who are focused on interoperable tooling have. Because in my view, RFC 3986's normative requirements for base URIs alone do not impose clear or consistent direct requirements on tooling, and my perusal of |
Beta Was this translation helpful? Give feedback.
-
OK, I have finally submitted a PR for a standalone JSON Referencing and Identification proposal at the JSON Schema Org's new referencing repository. We are hosting this discussion but encouraging submissions and participation from other projects. There will be at least one other proposal from another JSON Schema team member (and @jonaslagoni if you want to submit the one from asyncapi/spec#825 you are most welcome to do so! Also paging @whitlockjc ). The JRI proposal should address the concerns here, although some things are not entirely resolved. It also tries to bridge the various usages and use cases, so it's more complex in an effort to accommodate a variety of uses. Some of the ideas (like media type parameters for indicating Please feel free to open issues in that repo along with continuing discussions here. |
Beta Was this translation helpful? Give feedback.
-
In AsyncAPI we have issues with references, where basic use cases work, but everything gets fussy as soon as you dive into more advanced cases. This makes the life of building tools around them very tricky if not almost impossible.
We have been unable to find a library that can fully handle the use-cases and requirements so we can integrate it with our JS parser (Currently we use json-schema-ref-parser), because of the very use-cases and requirements highlighted here.
This proposal revolves around how we can build a core building block (across specifications) that makes sure it's easy to work with documents that contain references, where each reference follows different standards. This spans across (and is not limited by) the standards AsyncAPI, JSON Schema, and OpenAPI. It will be one of the very core building blocks for implementing parsers and other processing tools.
This discussion goes hand in hand with asyncapi/spec#825 for AsyncAPI 3.0 and this should be used to further progress the discussion points and requirements for the specification changes.
This is a first step effort to map out requirements to a library, that is not to say that we have to build this ourselves, cause if we can leverage other already existing tools that would probably be preferred! However, if that is the case, they MUST be willing to adapt to these requirements. Whether to do A or B is NOT the focal point in this discussion as this is ONLY about the requirements and use cases of said tool.
Requirements
These are the requirements that I expect are needed for any reference tool:
The API of the library also has different requirements, which come down to in which context it's used. This is what I can seem to gather:
Reference specifics
I want to give a quick overview of the different reference behaviors we see across standards (including those I know the most AsyncAPI, OpenAPI, and JSON Schema). If you find anything that is wrong or would make sense to redefine feel free to point it out 👌
AsyncAPI 2.x
Resource: https://www.asyncapi.com/docs/reference/specification/v2.4.0#referenceObjectURI resolvement: Base URI is not possible to set.
Resource resolution: Replace the entire containing object with the resolved resource. Uses references through the reference object, that in turn is relying on JSON Reference.
AsyncAPI 3.x
Referencing JSON (Avro, JSON Schema) and non-JSON data (GraphQL, Protobuf, XSD), are still TBD, discussion happening in https://github.com/asyncapi/spec/pull/825OpenAPI 3.0
Resource: https://spec.openapis.org/oas/v3.0.3URI resolvement: Uses the server object for the base URI for relative references.
Resource resolution: Replace the entire containing object with the resolved resource. Uses JSON Reference
OpenAPI 3.1
Resource: https://spec.openapis.org/oas/v3.1.0URI resolvement: Have multiple resolvement statements based on where the reference is located. Relative references within Reference Objects, PathItem Object $ref fields, Link Object operationRef fields, and Example Object externalValue fields, are resolved using the referring document as Base URI. A special case is resolving the fully qualified URL for relative references, which revolves around external documentation, license, contact and oath flow urls (all properties that is a url).
Relative references in Schema Objects, including any that appear as $id values, use the nearest parent $id as a Base URI, as described by JSON Schema Specification Draft 2020-12.
Resource resolution: custom behavior (but somewhat similar to JSON Reference), cannot find anything specific about the resource resolution.
Must adhere to Bundling (a feature in draft 2020-12): Whenever OpenAPI 3.1 documents need to resolve references, the tool need to understand the bundling behavior, loading all schemas in
/components/schemas
before trying to resolve the resource.JSON Schema draft 6 + 7
Resource draft 6: https://datatracker.ietf.org/doc/html/draft-wright-json-schema-01
Resource draft 7: https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-01
URI resolvement: Uses the nearest
$id
as the base URI for relative references.Resource resolution: Replace the entire containing JSON object with the resolved resource.
Bundling: Even though bundling was not well defined until draft 2019-09, it should be the default behaviour for handling schemas < draft 2019-09. It should however be possible to disable this behavior.
JSON Schema draft 2019-09
JSON Schema draft 2019-09: https://json-schema.org/specification-links.html#draft-2019-09-formerly-known-as-draft-8URI resolvement: Uses the nearest
$id
as the base URI for relative references. Uses JSON Pointer for URI fragments.Resource resolution: Replace the reference value with the resolved Schema object. Adapts to bundling.
Bundling: Implementations must lookup
$ref
in$defs
before trying to access the resource externally.JSON Schema draft 2020-12
JSON Schema draft 2020-12: https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00URI resolvement: Uses the nearest
$id
as base URI for relative references.Resource resolution: Replace the string reference with the resolved Schema object. Must adhere to bundling.
Bundling: Implementations must lookup
$ref
in$defs
before trying to access the resource externally.Use-cases
Here is a list of use-cases that I have been able to gather, that either have undefined/unknown behavior (unknown meaning the standard does not define the expected behavior) or behaviour we know and that we would possibly want to change (or not).
If any of these use cases seem wrong based on your experience, please point out where in the standards the behavior is explained so we can keep it as organized as possible.
known behavior cases
These are all the cases where I expect that we can conclude the expected behavior based on the specifications.
U3 - AsyncAPI + JSON Schema < draft 2019-09 with bundling
Discussion can be found here: #485 (comment)
Original question and answer: As far as I can understand it was not until draft 2019-09 that it enabled compound schemas (i.e. bundled schemas where any within
#/components/schemas
will automatically be loaded in implementations and so it does not matter that there exist no remote resource onhttps://my-schema-registry.org/UserSignedUp
as it will be locally matched with#/components/schemas/UserSignedUp
because of the$id
), i.e. this is not supported with draft-7. Expected behavior: Failure to locatehttps://my-schema-registry.org/UserSignedUp
.Expected behavior: Bundling should be defaulted to but can be turned off through options for schema formats JSON Schema < draft 2019-09 because it was not well defined until draft 2019-09, so it is hard put your foot down and say this MUST be the expected behavior.
U4 - AsyncAPI + JSON Schema >= draft 2019-09 with bundling and internal definitions
Discussions found here: #485 (comment)
If bundling is allowed, what is the accurate behavior for when schemas inline definitions within AsyncAPI? Take notice how the properties in in
#/$defs/UserSignedUp
has type number instead of string. Would the message payload validate accurately against{displayName: 0, email: 0}
or{displayName: "myUsername", email: "[email protected]"}
?Expected behavior: By default duplicate URIs should raise an error, which was defined in newer JSON schema version.
U5 - AsyncAPI + JSON Schema contradicting formats
Discussions found here: https://github.com//discussions/485#discussioncomment-3788937If
$schema
must be adhered to, how can tooling interpret the following AsyncAPI document, where the formats are contradicting? Would the payload schema be interpreted as a draft-7 or draft-2019/09 schema? Who has the highest precedence? Who defines that this is the desired behavior?Expected behavior: Because
$schema
allows you to define the schema for the embeded resource,$schema
"overwrites" whatever is defined withinschemaFormat
.U7 - OpenAPI + JSON Schema contradicting base-URI's
Discussion found here: https://github.com//discussions/485#discussioncomment-3788946I am not 100% sure this is even a contradiction, are the servers url used for relative references in URIs? Here I am assuming it is, cause it creates this weird behavior. I think one could argue two things here, which is 1. the base-URI is determined by the closest application keyword. I.e. once we reach the JSON Schema document, its reference is determined by the
$id
. 2. the OpenAPI base URI triumphs over the nested base URIs. I honestly think that the desired behavior is always 1. However, from reading the specs, I cannot whole-hearted say that is the expected behavior.Expected behavior: The server url has no effect as base URI for the schema object, as that definition has no baring on that part.
./UserSignedUp.yaml
./DisplayNames.yaml
Unknown behavior cases
These are all the cases where I simply cannot seem to find the information about what is supported and expected behavior when reading the standards from a tooling perspective.
U1 - AsyncAPI + JSON Schema >= draft 2019-09 with bundling
Discussion found here: https://github.com//discussions/485#discussioncomment-3788915Is bundling allowed within JSON Schema >= draft 2019-09 in AsyncAPI? Should it enable bundling within AsyncAPI so the reference
https://my-schema-registry.org/UserSignedUp
would accurately be loaded from#/components/schemas/UserSignedUp
before trying to fetch it remotely?U2 - AsyncAPI + JSON Schema >= draft 2019-09 with bundling and relative reference
Discussion found here: https://github.com//discussions/485#discussioncomment-3788920If U1 is allowed, then this would subsequently also be allowed because the base-URI is fairly straightforward to resolve. With >= draft 2019-09 it enables bundling within AsyncAPI and should accurately load the schemas without reaching out externally to access the schema
https://my-schema-registry.org/User
and accurately make the relative reference./User
use the base URI ofhttps://my-schema-registry.org/UserSignedUp
(https://my-schema-registry.org
) to resolve it tohttps://my-schema-registry.org/User
.U6 - AsyncAPI using OpenAPI Schema documents which expect a base URI
Discussion found here: https://github.com//discussions/485#discussioncomment-3788941Since AsyncAPI does not have a way to specify base URIs for relative reference resolvement, how do you reuse your schema documents across specs?
asyncapi.yaml
openapi.yaml
./UserSignedUp.yaml
./DisplayNames.yaml
I know I might be pushing the limits of what a user might do because if they just sat an
$id
for theUserSignedUp
schema you would not have had any issues.Maybe tooling always needs to provide an option for specifying the base URI around the application itself. So when used in AsyncAPI you still have the option of defining the base URI to
https://my-schema-registry.org
.U8 - AsyncAPI + non-JSON references
Discussion found here: https://github.com//discussions/485#discussioncomment-3788951How should non-JSON references be interpreted from a tooling side? Is that up to the individual implementor? If so, how can you make sure that reference fragments are properly upheld as expected? i.e. what if the implementer converted something to an array, where the user and fragment were defined as an object? Or an entirely different interpretation?
Example here I assume I can access the message through
#UserSignedUp
however what if the internal JSON representation is the following:Then I would have to access it as
#/messages/0
(just an example, dont like this conversion, but it's just to show it might be interpreted differently because we dont following a "standard").U9 - AsyncAPI + JSON Schema contradicting formats for remote references
Discussion found here: https://github.com//discussions/485#discussioncomment-3788954For remote references, if it's the media type that determines how a resource is processed, what happens when they are contradicted?
The mime-type received from
https://schema_registry.com/UserSignedUp
isapplication/schema+json;version=draft-07
, contradicting what was defined within the AsyncAPI document. Which one should tooling use?U10 - AsyncAPI + JSON Schema double contradicting formats for remote references
Discussion found here: https://github.com//discussions/485#discussioncomment-3788956Extending U9, with the specific schema that defined its own
$schema
format.The mime-type received from
https://schema_registry.com/UserSignedUp
isapplication/schema+json;version=draft-07
, and the$schema
defined within the schema itself ishttps://json-schema.org/draft/2020-12/schema
. Double contradicting what the AsyncAPI document defined and what the remote server returned.One could say this is just badly managed schemas, but how can tooling and users know what to do in this situation?
Relevant resources
These are some of the relevant resources that are somewhat related to this.
$ref
/$id
json-schema-org/json-schema-spec#724$id
conform to RFC 3986 suggestion for base URI elements json-schema-org/json-schema-spec#729Beta Was this translation helpful? Give feedback.
All reactions