-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PythonMutator: add diagnostics #1531
Conversation
fa8d143
to
ca3bdfd
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1531 +/- ##
==========================================
+ Coverage 52.25% 54.01% +1.76%
==========================================
Files 317 354 +37
Lines 18004 20764 +2760
==========================================
+ Hits 9408 11216 +1808
- Misses 7903 8730 +827
- Partials 693 818 +125 ☔ View full report in Codecov by Sentry. |
@pietern I've addressed your comments. Please take a look again. |
"github.com/databricks/cli/libs/dyn" | ||
) | ||
|
||
type pythonDiagnostic struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we serialize the diagnostics on the Python side in JSON instead of using the string representation for path
and location
?
That way we'll not need all the deserialization logic here to convert strings for location and path back to a diag.Dianostic
. We could directly use json.Unmarshal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @pietern for thoughts on the serialzation format for diagnostics. Separately customers have asked for machine readable warnings, and the format we decide on here should be consistent with that.
I'd recommend the full JSON body because it's the simplest to maintain as well as is easy to parse for consumers of the JSON representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did research on how other tools report warnings:
- go vet
- eslint
- pylint
- terraform validate
They use JSON for source locations.
All k8s tools report the path as a string. I will change it to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that this format is purely internal, and we wouldn't show this to users, occasionally, we might need to read it for debugging.
If we ever want CLI to report warnings in a structured format, it can use any format not necessarily the same as here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 -- I wouldn't want to couple them as they might need to evolve independently.
@@ -116,7 +120,11 @@ func (m *pythonMutator) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagno | |||
return merge.Override(leftRoot, rightRoot, visitor) | |||
}) | |||
|
|||
return diag.FromErr(err) | |||
if len(mutateDiagnostics) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead return a combined list of diagnostics here? Otherwise we could possibly ignore errors in the bundle.Mutate
function if pyDABs returned a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't merge errors because they can create duplicates. I will refactor them to make it easier to follow.
@@ -152,42 +161,71 @@ func (m *pythonMutator) runPythonMutator(ctx context.Context, cacheDir string, r | |||
inputPath, | |||
"--output", | |||
outputPath, | |||
"--diagnostics", | |||
diagnosticsPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a plan to keep the pyDABs version consistent with the CLI version? We don't need one immediately it'll likely be important down the line given there's an implicit dependency on the types defined in the CLI and pyDABs (diag.Diagnostics
in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be consistent because changes need to be backward and forward-compatible. If we make a breaking change, we raise the minimum required version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the minimum required version stored/communicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. For now, we can add a minimal required version of CLI as the configuration in the template, and document it.
We can also check in VerifyCliVersion, but right now, it's executed before we run the Python mutator if we want to append a different minimum required version there. Another difficulty is that this configuration parameter might already be used.
Another way is to pass the current CLI version into Python code and check it there. We can make it a part of the effective bundle configuration.
|
||
// if diagnostics file exists, it gives the most descriptive errors | ||
// if there is any error, we treat it as fatal error, and stop processing | ||
if pythonDiagnostics.HasError() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend rolling up pythonDiagnosticsErr
and processErr
into the diagnostics that we return. We should protect against ignoring the errors if for some reason pythonDiagnostics
do exist but they don't capture the actual error cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pythonDiagnosticsErr
should always be nil if pythonDiagnostics
has an error
The idea behind pythonDiagnostics
is that we should never show a "process exit" error if it contains an explanation. This is how PyDABs explain what went wrong. There are 3 errors possible, and you can see the code below as the way for us to choose the most informative error that can explain what went wrong, in the order:
- PyDABs reported an error that we loaded/parsed succesfuly
- Process exited with non-zero code
- We failed to load/parse the diagnostics file, but the process exited successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling flow is not super straightforward. Could you include a few debug log lines to capture error conditions even if we don't act on them? When this goes wrong I'd like to be able to analyse why/how/etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I added logging just after each error is created
"github.com/databricks/cli/libs/dyn" | ||
) | ||
|
||
type pythonDiagnostic struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 -- I wouldn't want to couple them as they might need to evolve independently.
@@ -152,42 +161,71 @@ func (m *pythonMutator) runPythonMutator(ctx context.Context, cacheDir string, r | |||
inputPath, | |||
"--output", | |||
outputPath, | |||
"--diagnostics", | |||
diagnosticsPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the minimum required version stored/communicated?
|
||
// if diagnostics file exists, it gives the most descriptive errors | ||
// if there is any error, we treat it as fatal error, and stop processing | ||
if pythonDiagnostics.HasError() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling flow is not super straightforward. Could you include a few debug log lines to capture error conditions even if we don't act on them? When this goes wrong I'd like to be able to analyse why/how/etc.
// | ||
// It contains a list of warnings and errors that we should print to users. | ||
func loadDiagnosticsFile(path string) (diag.Diagnostics, error) { | ||
file, err := os.Open(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed to exist, even if there are no diagnostics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it will be less error-prone like that, so it doesn't happen that integration suddenly breaks and stops to print warnings and errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capture this intent in the comment. The next person looking at this code could add a not-exists check not being aware of this intent.
|
||
assert.NoError(t, diag.Error()) | ||
assert.NoError(t, diagnostics.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention: diags
@shreyas-goenka @pietern addressed your comments, please take a look. The remaining question is the CLI version check. I don't find it pressing if we put effort into templates and documentation. I think in the long run, we should serialize the current CLI version as a part of the JSON we pass. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor last comment
// | ||
// It contains a list of warnings and errors that we should print to users. | ||
func loadDiagnosticsFile(path string) (diag.Diagnostics, error) { | ||
file, err := os.Open(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capture this intent in the comment. The next person looking at this code could add a not-exists check not being aware of this intent.
@pietern done, I've added an elaborate comment and test. Can you please add it to the merge queue? @pietern @shreyas-goenka thanks for taking the look |
Bundles: As of this release you can interact with bundles when running the CLI on DBR (e.g. via the Web Terminal). * Fix non-default project names not working in dbt-sql template ([#1500](#1500)). * Improve `bundle validate` output ([#1532](#1532)). * Fixed resolving variable references inside slice variable ([#1550](#1550)). * Fixed bundle not loading when empty variable is defined ([#1552](#1552)). * Use `vfs.Path` for filesystem interaction ([#1554](#1554)). * Replace `vfs.Path` with extension-aware filer when running on DBR ([#1556](#1556)). Internal: * merge.Override: Fix handling of dyn.NilValue ([#1530](#1530)). * Compare `.Kind()` instead of direct equality checks on a `dyn.Value` ([#1520](#1520)). * PythonMutator: register product in user agent extra ([#1533](#1533)). * Ignore `dyn.NilValue` when traversing value from `dyn.Map` ([#1547](#1547)). * Add extra tests for the sync block ([#1548](#1548)). * PythonMutator: add diagnostics ([#1531](#1531)). * PythonMutator: support omitempty in PyDABs ([#1513](#1513)). * PythonMutator: allow insert 'resources' and 'resources.jobs' ([#1555](#1555)).
Bundles: As of this release you can interact with bundles when running the CLI on DBR (e.g. via the Web Terminal). * Fix non-default project names not working in dbt-sql template ([#1500](#1500)). * Improve `bundle validate` output ([#1532](#1532)). * Fixed resolving variable references inside slice variable ([#1550](#1550)). * Fixed bundle not loading when empty variable is defined ([#1552](#1552)). * Use `vfs.Path` for filesystem interaction ([#1554](#1554)). * Replace `vfs.Path` with extension-aware filer when running on DBR ([#1556](#1556)). Internal: * merge.Override: Fix handling of dyn.NilValue ([#1530](#1530)). * Compare `.Kind()` instead of direct equality checks on a `dyn.Value` ([#1520](#1520)). * PythonMutator: register product in user agent extra ([#1533](#1533)). * Ignore `dyn.NilValue` when traversing value from `dyn.Map` ([#1547](#1547)). * Add extra tests for the sync block ([#1548](#1548)). * PythonMutator: add diagnostics ([#1531](#1531)). * PythonMutator: support omitempty in PyDABs ([#1513](#1513)). * PythonMutator: allow insert 'resources' and 'resources.jobs' ([#1555](#1555)).
Changes
Allow PyDABs to report
dyn.Diagnostics
by writing todiagnostics.json
supplied as an argument, similar toinput.json
andoutput.json
Such errors are not yet properly printed in
databricks bundle validate
, which will be fixed in a follow-up PR.Tests
Unit tests