Documenting how to validate TOML with JSON schema #1038

ncoghlan · 2024-09-09T08:50:50Z

ncoghlan
Sep 9, 2024

(Inspired by #792, but as a discussion rather than an issue since I don't think it should even be a documentation proposal yet until there's an initial agreement that this is a good path to take)

Defining comprehensive data schemas is difficult (especially if they can reference each other), so using JSON Schema to validate TOML documents seems like a more pragmatic path forward than attempting to build a separate TOML-specific schema validation ecosystem.

A version of this idea is already implemented in taplo, which uses #:schema ./foo-schema.json comments to reference JSON schema documents: https://taplo.tamasfe.dev/configuration/directives.html#the-schema-directive

(While the Python standard library's tomllib module doesn't provide access to TOML comments, the feature is available by iterating over the body attribute of a tomlkit.TOMLDocument instance, allowing scanning for schema references using the same format as taplo)

Given a JSON schema reference, validating a TOML document against a JSON schema specification at runtime is going to be fairly straightforward: load the data from the TOML file, load the schema file into your preferred JSON schema validation library, and then check the data matches the schema.

What's missing is a clear explanation of how the different pieces of a TOML document map to different concepts in JSON schema, since the two specifications sometimes use different terminology for the same things, and there are some features of TOML that need to be skipped if you want the data read from the document to validate as JSON at all (let alone against a specific schema).

The TOML mapping for the basic JSON Schema types is straightforward (TOML type -> JSON type):

overall TOML document -> object
string -> string
float -> number
integer -> integer
table (including inline tables) -> object
array -> array
array of tables -> array of object specifications
boolean -> boolean
omitting optional keys from a document or table -> either null or omission from the schema's required properties (depending on the default value used when a key is missing)

All of the regular JSON schema features for these types can be applied to TOML documents, remembering that they apply to the parsed values, not the exact text as written into the TOML file (so things like the string quoting format or whether a table is inline or not don't matter).

Notable caveats and limitations for the basic types:

the TOML conversions of floating point NaN and infinity directly to the relevant floating point instances in the host language runtime will not validate as part of a standard float JSON schema field. To pass schema validation, nan and inf (and their positive and negative variants) need to be encoded as dual type ["float", "string"] fields rather than using the native float representations of the special values.
there's no TOML representation that allows for arrays to contain null values. The closest TOML has to a representation of null is "omit that key", which only applies to tables and the top level keys of a document.

The final case to consider is how dates, times, and their optional timezone offsets should be matched to the JSON schema RFC 3339 guidelines in https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats

This last part isn't actually a TOML question, it's a question of how the structured date/time objects emitted by a compliant TOML parser are serialised to strings before being passed to the chosen JSON Schema validator (passing the structured date/time objects directly will always fail, since they're not a valid JSON type).

For Python, for example, making jsonschema happy with serialised datetime values requires ensuring that they're converted to strings which comply with RFC 3339 as JSON Schema specifies (the ISO 8601 based isoformat() methods are sufficient for this, since they include the separators that RFC 3339 requires)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documenting how to validate TOML with JSON schema #1038

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Documenting how to validate TOML with JSON schema #1038

ncoghlan Sep 9, 2024

Replies: 0 comments

ncoghlan
Sep 9, 2024