Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PythonMutator: support omitempty in PyDABs #1513

Merged
merged 12 commits into from
Jul 3, 2024

Conversation

kanterov
Copy link
Contributor

@kanterov kanterov commented Jun 20, 2024

Changes

PyDABs output can omit empty sequences/mappings because we don't track them as optional. There is no semantic difference between empty and missing, which makes omitting correct. CLI detects that we falsely modify input resources by deleting all empty collections.

To handle that, we extend dyn.Override to allow visitors to ignore certain deletes. If we see that an empty sequence or mapping is deleted, we revert such delete.

Tests

Unit tests

@kanterov kanterov force-pushed the python-mutator-ommitempty branch 4 times, most recently from a8c5352 to b695d5d Compare June 24, 2024 13:15
@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.96%. Comparing base (e22dd8a) to head (b1d8b59).
Report is 161 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1513      +/-   ##
==========================================
+ Coverage   52.25%   53.96%   +1.70%     
==========================================
  Files         317      353      +36     
  Lines       18004    20544    +2540     
==========================================
+ Hits         9408    11086    +1678     
- Misses       7903     8653     +750     
- Partials      693      805     +112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kanterov kanterov force-pushed the python-mutator-ommitempty branch from b695d5d to fb6c851 Compare June 24, 2024 13:19
@kanterov kanterov force-pushed the python-mutator-ommitempty branch from fb6c851 to b1d8b59 Compare June 24, 2024 13:22
@kanterov kanterov requested a review from pietern June 24, 2024 13:23
@@ -311,6 +319,21 @@ func createInitOverrideVisitor(ctx context.Context) merge.OverrideVisitor {
}
}

func isOmitemptyDelete(left dyn.Value) bool {
// PyDABs output can omit empty sequences/mappings, because we don't track them as optional,
// there is no semantic difference between empty and missing, so we keep them as they were before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly with this PR you are trying to fix the case where a value in the original bundle config is an empty (but not nil) sequence or mapping, but pydabs skips emitting the key (with an empty value) in its serialized output. This causes an error (unexpected change at %q (delete)) which you want to avoid.

Is my understanding of the problem you are trying to solve correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We don't distinguish between empty and unset lists, so when the list is empty, we omit it in the output.

@kanterov kanterov requested a review from shreyas-goenka July 1, 2024 06:20
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM, but can you elaborate on why we don't treat these as different? If the JSON representation is mapped to a Python data class, I imagine there in fact being a difference between a None field and an empty list.

libs/dyn/merge/override.go Outdated Show resolved Hide resolved
@kanterov
Copy link
Contributor Author

kanterov commented Jul 1, 2024

@pietern having Optional[list[T]] was not helpful because any Python code (including user-specified) had to handle the possibility of missing values while there is no semantic difference in it. It also made many type signatures unnecessarily complex. So we only left it in "create" method signatures to avoid problems with mutating arguments.

In cases where it matters, we can still use Optional[list[T]] in data classes, and it will keep missing values as-is.

We can fix it in a different way by tracking which lists were not modified or ever specified explicitly.

@kanterov kanterov requested a review from pietern July 1, 2024 11:43
@@ -311,6 +319,21 @@ func createInitOverrideVisitor(ctx context.Context) merge.OverrideVisitor {
}
}

func isOmitemptyDelete(left dyn.Value) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitempty is specific to the JSON serializer. Here we check if the incoming value is an empty collection.

Maybe isEmptyCollection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional because we want to reserve this method to specific behavior with omit empty and add a comment explaining that. For instance, we intentionally don't handle objects where all fields are empty.

libs/dyn/merge/override.go Outdated Show resolved Hide resolved
// the final value of the node. 'VisitDelete' should return dyn.NilValue to
// do the delete, or can return 'left' to undo it.
//
// TODO 'VisitDelete' and 'VisitInsert' should support dyn.NilValue as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the semantics be?

I think you mean VisitUpdate here, not VisitDelete.


if err != nil {
return dyn.NewMapping(), err
}

// if 'delete' was undone, add it back
if deleteOut != dyn.NilValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check the Kind(). A nil value may have a location attached to it so direct equality can fail.


if err != nil {
return nil, err
}

// if 'delete' was undone, add it back
if out != dyn.NilValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@kanterov
Copy link
Contributor Author

kanterov commented Jul 2, 2024

@pietern, thanks for the suggestions about using errors. It's much cleaner now, and we don't need to document that insert/update doesn't support this behavior because the error is dedicated to the "delete" method.

@kanterov kanterov requested a review from pietern July 2, 2024 11:28
libs/dyn/merge/override.go Outdated Show resolved Hide resolved
if left.Kind() == dyn.KindNil {
return true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch/case would be cleaner here, IMO.

kanterov and others added 2 commits July 3, 2024 09:11
@pietern pietern enabled auto-merge July 3, 2024 07:17
@pietern pietern added this pull request to the merge queue Jul 3, 2024
Merged via the queue into databricks:main with commit b9e3c98 Jul 3, 2024
5 checks passed
pietern added a commit that referenced this pull request Jul 3, 2024
Bundles:

As of this release you can interact with bundles when running the CLI on DBR (e.g. via the Web Terminal).

 * Fix non-default project names not working in dbt-sql template ([#1500](#1500)).
 * Improve `bundle validate` output ([#1532](#1532)).
 * Fixed resolving variable references inside slice variable ([#1550](#1550)).
 * Fixed bundle not loading when empty variable is defined ([#1552](#1552)).
 * Use `vfs.Path` for filesystem interaction ([#1554](#1554)).
 * Replace `vfs.Path` with extension-aware filer when running on DBR ([#1556](#1556)).

Internal:
 * merge.Override: Fix handling of dyn.NilValue ([#1530](#1530)).
 * Compare `.Kind()` instead of direct equality checks on a `dyn.Value` ([#1520](#1520)).
 * PythonMutator: register product in user agent extra ([#1533](#1533)).
 * Ignore `dyn.NilValue` when traversing value from `dyn.Map` ([#1547](#1547)).
 * Add extra tests for the sync block ([#1548](#1548)).
 * PythonMutator: add diagnostics ([#1531](#1531)).
 * PythonMutator: support omitempty in PyDABs ([#1513](#1513)).
 * PythonMutator: allow insert 'resources' and 'resources.jobs' ([#1555](#1555)).
@pietern pietern mentioned this pull request Jul 3, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2024
Bundles:

As of this release you can interact with bundles when running the CLI on
DBR (e.g. via the Web Terminal).

* Fix non-default project names not working in dbt-sql template
([#1500](#1500)).
* Improve `bundle validate` output
([#1532](#1532)).
* Fixed resolving variable references inside slice variable
([#1550](#1550)).
* Fixed bundle not loading when empty variable is defined
([#1552](#1552)).
* Use `vfs.Path` for filesystem interaction
([#1554](#1554)).
* Replace `vfs.Path` with extension-aware filer when running on DBR
([#1556](#1556)).

Internal:
* merge.Override: Fix handling of dyn.NilValue
([#1530](#1530)).
* Compare `.Kind()` instead of direct equality checks on a `dyn.Value`
([#1520](#1520)).
* PythonMutator: register product in user agent extra
([#1533](#1533)).
* Ignore `dyn.NilValue` when traversing value from `dyn.Map`
([#1547](#1547)).
* Add extra tests for the sync block
([#1548](#1548)).
* PythonMutator: add diagnostics
([#1531](#1531)).
* PythonMutator: support omitempty in PyDABs
([#1513](#1513)).
* PythonMutator: allow insert 'resources' and 'resources.jobs'
([#1555](#1555)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants