Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PythonMutator: add diagnostics #1531

Merged
merged 7 commits into from
Jul 2, 2024

Conversation

kanterov
Copy link
Contributor

Changes

Allow PyDABs to report dyn.Diagnostics by writing to diagnostics.json supplied as an argument, similar to input.json and output.json

Such errors are not yet properly printed in databricks bundle validate, which will be fixed in a follow-up PR.

Tests

Unit tests

@kanterov kanterov force-pushed the python-mutator-diagnostics branch from fa8d143 to ca3bdfd Compare June 26, 2024 13:50
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 68.46154% with 41 lines in your changes missing coverage. Please review.

Project coverage is 54.01%. Comparing base (e22dd8a) to head (ca3bdfd).
Report is 168 commits behind head on main.

Files Patch % Lines
bundle/config/mutator/python/python_diagnostics.go 71.23% 15 Missing and 6 partials ⚠️
bundle/config/mutator/python/python_mutator.go 64.91% 14 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1531      +/-   ##
==========================================
+ Coverage   52.25%   54.01%   +1.76%     
==========================================
  Files         317      354      +37     
  Lines       18004    20764    +2760     
==========================================
+ Hits         9408    11216    +1808     
- Misses       7903     8730     +827     
- Partials      693      818     +125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kanterov kanterov requested a review from pietern June 26, 2024 13:53
@kanterov kanterov requested a review from pietern June 27, 2024 10:59
@kanterov
Copy link
Contributor Author

@pietern I've addressed your comments. Please take a look again.

bundle/config/mutator/python/python_mutator.go Outdated Show resolved Hide resolved
"github.com/databricks/cli/libs/dyn"
)

type pythonDiagnostic struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we serialize the diagnostics on the Python side in JSON instead of using the string representation for path and location?

That way we'll not need all the deserialization logic here to convert strings for location and path back to a diag.Dianostic. We could directly use json.Unmarshal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @pietern for thoughts on the serialzation format for diagnostics. Separately customers have asked for machine readable warnings, and the format we decide on here should be consistent with that.

I'd recommend the full JSON body because it's the simplest to maintain as well as is easy to parse for consumers of the JSON representation.

Copy link
Contributor Author

@kanterov kanterov Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did research on how other tools report warnings:

  • go vet
  • eslint
  • pylint
  • terraform validate

They use JSON for source locations.

All k8s tools report the path as a string. I will change it to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this format is purely internal, and we wouldn't show this to users, occasionally, we might need to read it for debugging.

If we ever want CLI to report warnings in a structured format, it can use any format not necessarily the same as here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 -- I wouldn't want to couple them as they might need to evolve independently.

@@ -116,7 +120,11 @@ func (m *pythonMutator) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagno
return merge.Override(leftRoot, rightRoot, visitor)
})

return diag.FromErr(err)
if len(mutateDiagnostics) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead return a combined list of diagnostics here? Otherwise we could possibly ignore errors in the bundle.Mutate function if pyDABs returned a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge errors because they can create duplicates. I will refactor them to make it easier to follow.

@@ -152,42 +161,71 @@ func (m *pythonMutator) runPythonMutator(ctx context.Context, cacheDir string, r
inputPath,
"--output",
outputPath,
"--diagnostics",
diagnosticsPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan to keep the pyDABs version consistent with the CLI version? We don't need one immediately it'll likely be important down the line given there's an implicit dependency on the types defined in the CLI and pyDABs (diag.Diagnostics in this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be consistent because changes need to be backward and forward-compatible. If we make a breaking change, we raise the minimum required version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the minimum required version stored/communicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. For now, we can add a minimal required version of CLI as the configuration in the template, and document it.

We can also check in VerifyCliVersion, but right now, it's executed before we run the Python mutator if we want to append a different minimum required version there. Another difficulty is that this configuration parameter might already be used.

Another way is to pass the current CLI version into Python code and check it there. We can make it a part of the effective bundle configuration.


// if diagnostics file exists, it gives the most descriptive errors
// if there is any error, we treat it as fatal error, and stop processing
if pythonDiagnostics.HasError() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend rolling up pythonDiagnosticsErr and processErr into the diagnostics that we return. We should protect against ignoring the errors if for some reason pythonDiagnostics do exist but they don't capture the actual error cause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pythonDiagnosticsErr should always be nil if pythonDiagnostics has an error

The idea behind pythonDiagnostics is that we should never show a "process exit" error if it contains an explanation. This is how PyDABs explain what went wrong. There are 3 errors possible, and you can see the code below as the way for us to choose the most informative error that can explain what went wrong, in the order:

  • PyDABs reported an error that we loaded/parsed succesfuly
  • Process exited with non-zero code
  • We failed to load/parse the diagnostics file, but the process exited successfully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling flow is not super straightforward. Could you include a few debug log lines to capture error conditions even if we don't act on them? When this goes wrong I'd like to be able to analyse why/how/etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I added logging just after each error is created

"github.com/databricks/cli/libs/dyn"
)

type pythonDiagnostic struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 -- I wouldn't want to couple them as they might need to evolve independently.

@@ -152,42 +161,71 @@ func (m *pythonMutator) runPythonMutator(ctx context.Context, cacheDir string, r
inputPath,
"--output",
outputPath,
"--diagnostics",
diagnosticsPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the minimum required version stored/communicated?


// if diagnostics file exists, it gives the most descriptive errors
// if there is any error, we treat it as fatal error, and stop processing
if pythonDiagnostics.HasError() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling flow is not super straightforward. Could you include a few debug log lines to capture error conditions even if we don't act on them? When this goes wrong I'd like to be able to analyse why/how/etc.

//
// It contains a list of warnings and errors that we should print to users.
func loadDiagnosticsFile(path string) (diag.Diagnostics, error) {
file, err := os.Open(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed to exist, even if there are no diagnostics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it will be less error-prone like that, so it doesn't happen that integration suddenly breaks and stops to print warnings and errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture this intent in the comment. The next person looking at this code could add a not-exists check not being aware of this intent.


assert.NoError(t, diag.Error())
assert.NoError(t, diagnostics.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention: diags

@kanterov
Copy link
Contributor Author

kanterov commented Jul 1, 2024

@shreyas-goenka @pietern addressed your comments, please take a look.

The remaining question is the CLI version check. I don't find it pressing if we put effort into templates and documentation. I think in the long run, we should serialize the current CLI version as a part of the JSON we pass. What do you think?

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor last comment

//
// It contains a list of warnings and errors that we should print to users.
func loadDiagnosticsFile(path string) (diag.Diagnostics, error) {
file, err := os.Open(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture this intent in the comment. The next person looking at this code could add a not-exists check not being aware of this intent.

@kanterov
Copy link
Contributor Author

kanterov commented Jul 2, 2024

@pietern done, I've added an elaborate comment and test. Can you please add it to the merge queue?

@pietern @shreyas-goenka thanks for taking the look

@pietern pietern added this pull request to the merge queue Jul 2, 2024
Merged via the queue into databricks:main with commit 5a0a6d7 Jul 2, 2024
5 checks passed
pietern added a commit that referenced this pull request Jul 3, 2024
Bundles:

As of this release you can interact with bundles when running the CLI on DBR (e.g. via the Web Terminal).

 * Fix non-default project names not working in dbt-sql template ([#1500](#1500)).
 * Improve `bundle validate` output ([#1532](#1532)).
 * Fixed resolving variable references inside slice variable ([#1550](#1550)).
 * Fixed bundle not loading when empty variable is defined ([#1552](#1552)).
 * Use `vfs.Path` for filesystem interaction ([#1554](#1554)).
 * Replace `vfs.Path` with extension-aware filer when running on DBR ([#1556](#1556)).

Internal:
 * merge.Override: Fix handling of dyn.NilValue ([#1530](#1530)).
 * Compare `.Kind()` instead of direct equality checks on a `dyn.Value` ([#1520](#1520)).
 * PythonMutator: register product in user agent extra ([#1533](#1533)).
 * Ignore `dyn.NilValue` when traversing value from `dyn.Map` ([#1547](#1547)).
 * Add extra tests for the sync block ([#1548](#1548)).
 * PythonMutator: add diagnostics ([#1531](#1531)).
 * PythonMutator: support omitempty in PyDABs ([#1513](#1513)).
 * PythonMutator: allow insert 'resources' and 'resources.jobs' ([#1555](#1555)).
@pietern pietern mentioned this pull request Jul 3, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2024
Bundles:

As of this release you can interact with bundles when running the CLI on
DBR (e.g. via the Web Terminal).

* Fix non-default project names not working in dbt-sql template
([#1500](#1500)).
* Improve `bundle validate` output
([#1532](#1532)).
* Fixed resolving variable references inside slice variable
([#1550](#1550)).
* Fixed bundle not loading when empty variable is defined
([#1552](#1552)).
* Use `vfs.Path` for filesystem interaction
([#1554](#1554)).
* Replace `vfs.Path` with extension-aware filer when running on DBR
([#1556](#1556)).

Internal:
* merge.Override: Fix handling of dyn.NilValue
([#1530](#1530)).
* Compare `.Kind()` instead of direct equality checks on a `dyn.Value`
([#1520](#1520)).
* PythonMutator: register product in user agent extra
([#1533](#1533)).
* Ignore `dyn.NilValue` when traversing value from `dyn.Map`
([#1547](#1547)).
* Add extra tests for the sync block
([#1548](#1548)).
* PythonMutator: add diagnostics
([#1531](#1531)).
* PythonMutator: support omitempty in PyDABs
([#1513](#1513)).
* PythonMutator: allow insert 'resources' and 'resources.jobs'
([#1555](#1555)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants