You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Inspired by #792, but as a discussion rather than an issue since I don't think it should even be a documentation proposal yet until there's an initial agreement that this is a good path to take)
Defining comprehensive data schemas is difficult (especially if they can reference each other), so using JSON Schema to validate TOML documents seems like a more pragmatic path forward than attempting to build a separate TOML-specific schema validation ecosystem.
(While the Python standard library's tomllib module doesn't provide access to TOML comments, the feature is available by iterating over the body attribute of a tomlkit.TOMLDocument instance, allowing scanning for schema references using the same format as taplo)
Given a JSON schema reference, validating a TOML document against a JSON schema specification at runtime is going to be fairly straightforward: load the data from the TOML file, load the schema file into your preferred JSON schema validation library, and then check the data matches the schema.
What's missing is a clear explanation of how the different pieces of a TOML document map to different concepts in JSON schema, since the two specifications sometimes use different terminology for the same things, and there are some features of TOML that need to be skipped if you want the data read from the document to validate as JSON at all (let alone against a specific schema).
The TOML mapping for the basic JSON Schema types is straightforward (TOML type -> JSON type):
omitting optional keys from a document or table -> either null or omission from the schema's required properties (depending on the default value used when a key is missing)
All of the regular JSON schema features for these types can be applied to TOML documents, remembering that they apply to the parsed values, not the exact text as written into the TOML file (so things like the string quoting format or whether a table is inline or not don't matter).
Notable caveats and limitations for the basic types:
the TOML conversions of floating point NaN and infinity directly to the relevant floating point instances in the host language runtime will not validate as part of a standard float JSON schema field. To pass schema validation, nan and inf (and their positive and negative variants) need to be encoded as dual type ["float", "string"] fields rather than using the native float representations of the special values.
there's no TOML representation that allows for arrays to contain null values. The closest TOML has to a representation of null is "omit that key", which only applies to tables and the top level keys of a document.
This last part isn't actually a TOML question, it's a question of how the structured date/time objects emitted by a compliant TOML parser are serialised to strings before being passed to the chosen JSON Schema validator (passing the structured date/time objects directly will always fail, since they're not a valid JSON type).
For Python, for example, making jsonschema happy with serialised datetime values requires ensuring that they're converted to strings which comply with RFC 3339 as JSON Schema specifies (the ISO 8601 based isoformat() methods are sufficient for this, since they include the separators that RFC 3339 requires)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
(Inspired by #792, but as a discussion rather than an issue since I don't think it should even be a documentation proposal yet until there's an initial agreement that this is a good path to take)
Defining comprehensive data schemas is difficult (especially if they can reference each other), so using JSON Schema to validate TOML documents seems like a more pragmatic path forward than attempting to build a separate TOML-specific schema validation ecosystem.
A version of this idea is already implemented in
taplo
, which uses#:schema ./foo-schema.json
comments to reference JSON schema documents: https://taplo.tamasfe.dev/configuration/directives.html#the-schema-directive(While the Python standard library's
tomllib
module doesn't provide access to TOML comments, the feature is available by iterating over thebody
attribute of atomlkit.TOMLDocument
instance, allowing scanning for schema references using the same format astaplo
)Given a JSON schema reference, validating a TOML document against a JSON schema specification at runtime is going to be fairly straightforward: load the data from the TOML file, load the schema file into your preferred JSON schema validation library, and then check the data matches the schema.
What's missing is a clear explanation of how the different pieces of a TOML document map to different concepts in JSON schema, since the two specifications sometimes use different terminology for the same things, and there are some features of TOML that need to be skipped if you want the data read from the document to validate as JSON at all (let alone against a specific schema).
The TOML mapping for the basic JSON Schema types is straightforward (TOML type -> JSON type):
All of the regular JSON schema features for these types can be applied to TOML documents, remembering that they apply to the parsed values, not the exact text as written into the TOML file (so things like the string quoting format or whether a table is inline or not don't matter).
Notable caveats and limitations for the basic types:
float
JSON schema field. To pass schema validation,nan
andinf
(and their positive and negative variants) need to be encoded as dual type["float", "string"]
fields rather than using the native float representations of the special values.null
values. The closest TOML has to a representation ofnull
is "omit that key", which only applies to tables and the top level keys of a document.The final case to consider is how dates, times, and their optional timezone offsets should be matched to the JSON schema RFC 3339 guidelines in https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats
This last part isn't actually a TOML question, it's a question of how the structured date/time objects emitted by a compliant TOML parser are serialised to strings before being passed to the chosen JSON Schema validator (passing the structured date/time objects directly will always fail, since they're not a valid JSON type).
For Python, for example, making
jsonschema
happy with serialiseddatetime
values requires ensuring that they're converted to strings which comply with RFC 3339 as JSON Schema specifies (the ISO 8601 basedisoformat()
methods are sufficient for this, since they include the separators that RFC 3339 requires)Beta Was this translation helpful? Give feedback.
All reactions