Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bundle JSON schema modular with $defs #1700

Merged
merged 62 commits into from
Sep 10, 2024
Merged

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Aug 20, 2024

Changes

This PR makes sweeping changes to the way we generate and test the bundle JSON schema. The main benefits are:

  1. More modular JSON schema. Every definition in the schema now is one level deep and points to references instead of inlining the entire schema for a field. This unblocks PyDABs from taking a dependency on the JSON schema.

  2. Generate the JSON schema during CLI code generation. Directly stream it instead of computing it at runtime whenever a user calls databricks bundle schema. This is nice because we no longer need to embed a partial OpenAPI spec in the CLI. Down the line, we can add a Schema() method to every struct in the Databricks Go SDK and remove the dependency on the OpenAPI spec altogether. It'll become more important once we decouple Go SDK structs and methods from the underlying APIs.

  3. Add enum values for Go SDK fields in the JSON schema. Better autocompletion and validation for these fields. As a follow-up, we can add enum values for non-Go SDK enums as well (created internal ticket to track).

  4. Use "packageName.structName" as a key to read JSON schemas from the OpenAPI spec for Go SDK structs. Before, we would use an unrolled presentation of the JSON schema (stored in bundle_descriptions.json), which was complex to parse and include in the final JSON schema output. This also means loading values from the OpenAPI spec for target schema works automatically and no longer needs custom code.

  5. Support recursive types (eg: for_each_task). With us now using $refs everywhere it's trivial to support.

  6. Using complex variables would be invalid according to the schema generated before this PR. Now that bug is fixed. In the future adding more custom rules will be easier as well due to the single level nature of the JSON schema.

Since this is a complete change of approach in how we generate the JSON schema, there are a few (very minor) regressions worth calling out.

  1. We'll lose a few custom descriptions for non Go SDK structs that were a part of bundle_descriptions.json. Support for those can be added in the future as a followup.
  2. Since now the final JSON schema is a static artefact, we lose some lead time for the signal that JSON schema integration tests are failing. It's okay though since we have a lot of coverage via the existing unit tests.

Tests

Unit tests. End to end tests are being added in this PR:
#1726

Previous unit tests were all deleted because they were bloated. Effort was made to make the new unit tests provide (almost) equivalent coverage.

bundle/generated/embed.go Outdated Show resolved Hide resolved
bundle/internal/schema/main.go Outdated Show resolved Hide resolved
bundle/internal/schema/main.go Outdated Show resolved Hide resolved
libs/jsonschema/from_type.go Outdated Show resolved Hide resolved
libs/jsonschema/from_type.go Outdated Show resolved Hide resolved
libs/jsonschema/from_type.go Outdated Show resolved Hide resolved
libs/jsonschema/from_type.go Show resolved Hide resolved
libs/jsonschema/from_type.go Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few remaining comments.

.codegen.json Outdated Show resolved Hide resolved
.gitattributes Outdated Show resolved Hide resolved
cmd/bundle/schema.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewnester will also take a look

bundle/internal/schema/main.go Outdated Show resolved Hide resolved
bundle/internal/schema/parser.go Show resolved Hide resolved
@shreyas-goenka shreyas-goenka added this pull request to the merge queue Sep 10, 2024
Merged via the queue into main with commit 28b39cd Sep 10, 2024
5 checks passed
@shreyas-goenka shreyas-goenka deleted the improve/json-schema branch September 10, 2024 14:02
andrewnester added a commit that referenced this pull request Sep 18, 2024
CLI:
 * Added listing cluster filtering for cluster lookups ([#1754](#1754)).

Bundles:
 * Expand library globs relative to the sync root ([#1756](#1756)).
 * Fixed generated YAML missing 'default' for empty values ([#1765](#1765)).
 * Use periodic triggers in all templates ([#1739](#1739)).
 * Use the friendly name of service principals when shortening their name ([#1770](#1770)).
 * Fixed detecting full syntax variable override which includes type field ([#1775](#1775)).

Internal:
 * Pass copy of `dyn.Path` to callback function ([#1747](#1747)).
 * Make bundle JSON schema modular with `$defs` ([#1700](#1700)).
 * Alias variables block in the `Target` struct ([#1748](#1748)).
 * Add end to end integration tests for bundle JSON schema ([#1726](#1726)).
 * Fix artifact upload integration tests ([#1767](#1767)).

API Changes:
 * Added `databricks quality-monitors regenerate-dashboard` command.

OpenAPI commit d05898328669a3f8ab0c2ecee37db2673d3ea3f7 (2024-09-04)
Dependency updates:
 * Bump golang.org/x/term from 0.23.0 to 0.24.0 ([#1757](#1757)).
 * Bump golang.org/x/oauth2 from 0.22.0 to 0.23.0 ([#1761](#1761)).
 * Bump golang.org/x/text from 0.17.0 to 0.18.0 ([#1759](#1759)).
 * Bump github.com/databricks/databricks-sdk-go from 0.45.0 to 0.46.0 ([#1760](#1760)).
github-merge-queue bot pushed a commit that referenced this pull request Sep 18, 2024
Bundles:
* Added listing cluster filtering for cluster lookups
([#1754](#1754)).
* Expand library globs relative to the sync root
([#1756](#1756)).
* Fixed generated YAML missing 'default' for empty values
([#1765](#1765)).
* Use periodic triggers in all templates
([#1739](#1739)).
* Use the friendly name of service principals when shortening their name
([#1770](#1770)).
* Fixed detecting full syntax variable override which includes type
field ([#1775](#1775)).

Internal:
* Pass copy of `dyn.Path` to callback function
([#1747](#1747)).
* Make bundle JSON schema modular with `$defs`
([#1700](#1700)).
* Alias variables block in the `Target` struct
([#1748](#1748)).
* Add end to end integration tests for bundle JSON schema
([#1726](#1726)).
* Fix artifact upload integration tests
([#1767](#1767)).

API Changes:
 * Added `databricks quality-monitors regenerate-dashboard` command.

OpenAPI commit d05898328669a3f8ab0c2ecee37db2673d3ea3f7 (2024-09-04)
Dependency updates:
* Bump golang.org/x/term from 0.23.0 to 0.24.0
([#1757](#1757)).
* Bump golang.org/x/oauth2 from 0.22.0 to 0.23.0
([#1761](#1761)).
* Bump golang.org/x/text from 0.17.0 to 0.18.0
([#1759](#1759)).
* Bump github.com/databricks/databricks-sdk-go from 0.45.0 to 0.46.0
([#1760](#1760)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants