[JOBS-11439] Add taskValues support in remoteDbUtils #406

vsamoilov · 2023-10-13T13:29:14Z

Changes

Support dbutils.jobs.taskValues methods in RemoteDbUtils and add additional checkTaskValue to allow local testing

Tests

Added three test cases in test_dbutils.py. Integration tests are not needed since changes only for remote mode

make test run locally
make fmt applied
relevant integration tests applied

mgyucht

I'm having trouble really understanding this change. Seems very simple, but maybe I'm confused about the API we are trying to support. Is it not possible to make this work for testing without exposing a new method? I would have thought that you could set a task value and then get it; I think you can do that in DBR but you can't outside, but we can't mock that behavior? Do people ever specify different values for default and debugValue?

mgyucht · 2023-10-23T13:35:58Z

databricks/sdk/dbutils.py

+        def checkTaskValue(self, key: str, value: any) -> bool:
+            """
+            Check that the task value with given key is equal to the given value
+            """
+            return self.values.get(key) == value


We should make sure _TaskValuesUtil implements the interface defined at https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/runtime/dbutils_stub.py#L197. This only defines get and set. Otherwise, people may use this method and be surprised that it doesn't exist in DBR.

I added documentation update about this specifically in that PR: https://github.com/databricks/docs/pull/13668

mgyucht · 2023-10-23T13:40:50Z

databricks/sdk/dbutils.py

+            """
+            Returns the latest task value that belongs to the current job run
+            """
+            return debugValue


Wow, this API is a bit confusing. What is the difference between default and debugValue?

https://docs.databricks.com/en/dev-tools/databricks-utils.html#get-command-dbutilsjobstaskvaluesget - default is returned when we inside job run and task value can't be found. And debugValue is what we always return when outside of job run: for example, to be able to run notebook outside of job and check if it is just working without errors

vsamoilov · 2023-10-23T13:51:51Z

I'm having trouble really understanding this change. Seems very simple, but maybe I'm confused about the API we are trying to support. Is it not possible to make this work for testing without exposing a new method? I would have thought that you could set a task value and then get it; I think you can do that in DBR but you can't outside, but we can't mock that behavior? Do people ever specify different values for default and debugValue?

I was implementing change according to that doc: https://docs.google.com/document/d/1yrXXA27GJOB968apgnxnMsWfj0SJJn2cpArE4zzIORU/edit#heading=h.a2at0pdkk01r. As I understood the idea for additional method is to check if set(key, value) was called with specified parameters, because get interface is for case when we are inside job run (we need some task key at least) cc @gaborratky-db @mgyucht

gaborratky-db · 2023-10-23T14:47:55Z

I'm having trouble really understanding this change. Seems very simple, but maybe I'm confused about the API we are trying to support. Is it not possible to make this work for testing without exposing a new method? I would have thought that you could set a task value and then get it; I think you can do that in DBR but you can't outside, but we can't mock that behavior? Do people ever specify different values for default and debugValue?

Hey @mgyucht, welcome to the world of task values 😄 In DBR, when a notebook is run outside the context of a job run, set is a no-op and get returns the value of debugValue.

The reason we can't simply get what we set is because get has a well-defined behavior when run outside the context of a job run: return debugValue or throw an error. get also requires an explicit taskKey whereas set is implicit.

I originally suggested that we implement an assert(key, value) method on taskValues that does anassert internally, so that it's clear that it is only meant for testing as part of unit tests. If you think the documentation and the extra, SDK-only method would cause more confusion than it is useful, then let's simply get rid of it.

gaborratky-db

Here are my suggested changes based on my original thinking. If assert is confusing/not useful, then let's remove it.

databricks/sdk/dbutils.py

gaborratky-db · 2023-10-23T14:53:07Z

databricks/sdk/dbutils.py

+        def checkTaskValue(self, key: str, value: any) -> bool:
+            """
+            Check that the task value with given key is equal to the given value
+            """
+            return self.values.get(key) == value


Suggested change

def checkTaskValue(self, key: str, value: any) -> bool:

"""

Check that the task value with given key is equal to the given value

"""

return self.values.get(key) == value

def assert(self, key: str, value: any) -> None:

"""

Assert that the task value with the given key was set to the given value. This method is only available when run outside of a job run (i.e., locally or as part of a unit test).

"""

assert self.values.get(key) == value

I can't call it assert, is assertTaskValue works?

Let's discuss with @mgyucht first if we're even going to keep it around or not.

gaborratky-db · 2023-10-23T14:53:29Z

tests/test_dbutils.py

+def test_jobs_task_values_set(dbutils):
+    dbutils.jobs.taskValues.set('key', 'value')
+
+    assert dbutils.jobs.taskValues.checkTaskValue('key', 'value')


Suggested change

assert dbutils.jobs.taskValues.checkTaskValue('key', 'value')

dbutils.jobs.taskValues.assert('key', 'value')

gaborratky-db · 2023-10-23T14:54:23Z

tests/test_dbutils.py

+
+
+def test_jobs_task_values_check_task_value(dbutils):
+    assert (dbutils.jobs.taskValues.checkTaskValue('key', 'value') == False)


Suggested change

assert (dbutils.jobs.taskValues.checkTaskValue('key', 'value') == False)

try:

dbutils.jobs.taskValues.assert('key', 'value')

// do something idiomatic here that makes the test fail

except AssertError:

// Swallow error as it's what we expected to happen

mgyucht · 2023-10-25T09:08:09Z

Thanks for clarifying, that makes a lot of sense. My main concern is really that there is a divergence in the API between DBUtils in the SDK and DBUtils in DBR. Where possible, I think preventing this divergence ensures that customers don't run into AttributeErrors at runtime in DBR. I do understand why you need to add an additional method now because of the specific semantics of getting task values outside of a job run. If we add this method here, I suggest adding a corresponding method in DBR as well. Do you think that's feasible?

vsamoilov · 2023-10-25T09:17:30Z

Thanks for clarifying, that makes a lot of sense. My main concern is really that there is a divergence in the API between DBUtils in the SDK and DBUtils in DBR. Where possible, I think preventing this divergence ensures that customers don't run into AttributeErrors at runtime in DBR. I do understand why you need to add an additional method now because of the specific semantics of getting task values outside of a job run. If we add this method here, I suggest adding a corresponding method in DBR as well. Do you think that's feasible?

I think it is not very feasible, because if we want the similar implementation, then we need to check all job tasks for taskValue with specified key. Also as I understood, use case during local development is that we can write unit tests in which we will run task function and then check that we set correct value in specified key. I don't know about such use cases in DBR. I think if we will write in documentation that it is only for unit test purposes during local development and implement it by assert call, then it should be enough to prevent using in DBR, what do you think? cc @mgyucht @gaborratky-db

gaborratky-db · 2023-10-26T07:20:33Z

Thanks for clarifying, that makes a lot of sense. My main concern is really that there is a divergence in the API between DBUtils in the SDK and DBUtils in DBR. Where possible, I think preventing this divergence ensures that customers don't run into AttributeErrors at runtime in DBR. I do understand why you need to add an additional method now because of the specific semantics of getting task values outside of a job run. If we add this method here, I suggest adding a corresponding method in DBR as well. Do you think that's feasible?

I think it is not very feasible, because if we want the similar implementation, then we need to check all job tasks for taskValue with specified key. Also as I understood, use case during local development is that we can write unit tests in which we will run task function and then check that we set correct value in specified key. I don't know about such use cases in DBR. I think if we will write in documentation that it is only for unit test purposes during local development and implement it by assert call, then it should be enough to prevent using in DBR, what do you think? cc @mgyucht @gaborratky-db

This isn't necessarily true because we know in DBR which taskKey the dbutils.jobs.taskValues call runs for and we can call dbutils.jobs.taskValues.get("<my own task key>", ... ) to retrieve the task value.

If we want to keep parity between DBR and the SDK implementation, one option is to extend the semantics of dbutils.jobs.taskValues.get and make it possible to query the task's own task values when called as part of a job run. For example:

dbutils.jobs.taskValues.get(None, "myKey") would query whether something was set with dbutils.jobs.taskValues.set("myKey", "myValue"). I don't like using None as a special taskKey value here but I don't have a better idea off the top of my head.

gaborratky-db · 2023-10-26T07:24:01Z

I quickly looked at the implementation and we don't actually know the current taskKey, only the current run id, which we can't use to query the values.

I suggest that we simply remove assertTaskValue from the SDK version, keep parity, ship the fix, and come back to this problem when we have additional capacity to improve dbutils.jobs.taskValues.

codecov-commenter · 2023-10-27T09:21:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files	Coverage Δ
databricks/sdk/dbutils.py	`79.41% <100.00%> (+1.06%)`	⬆️

📢 Thoughts on this report? Let us know!.

gaborratky-db

Two requests, LGTM otherwise. I will defer the final review to @mgyucht to see if he has any further questions.

databricks/sdk/dbutils.py

tests/test_dbutils.py

gaborratky-db · 2023-10-31T09:41:39Z

@mgyucht we addressed the changes and the implementation is simple and straightforward: the code will run in "remote" mode (aka locally) and will be more portable as a result. This should be enough for parity, PTAL.

mgyucht

I think this is good for now. If you are interested in updating the in-DBR dbutils.jobs.task_values to have other helpers in the future, we can make the corresponding changes here too.

* Introduce more specific exceptions, like `NotFound`, `AlreadyExists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others ([#376](#376)). This makes it easier to handle errors thrown by the Databricks API. Instead of catching `DatabricksError` and checking the error_code field, you can catch one of these subtypes of `DatabricksError`, which is more ergonomic and removes the need to rethrow exceptions that you don't want to catch. For example: ```python try: return (self._ws .permissions .get(object_type, object_id)) except DatabricksError as e: if e.error_code in [ "RESOURCE_DOES_NOT_EXIST", "RESOURCE_NOT_FOUND", "PERMISSION_DENIED", "FEATURE_DISABLED", "BAD_REQUEST"]: logger.warning(...) return None raise RetryableError(...) from e ``` can be replaced with ```python try: return (self._ws .permissions .get(object_type, object_id)) except PermissionDenied, FeatureDisabled: logger.warning(...) return None except NotFound: raise RetryableError(...) ``` * Paginate all SCIM list requests in the SDK ([#440](#440)). This change ensures that SCIM list() APIs use a default limit of 100 resources, leveraging SCIM's offset + limit pagination to batch requests to the Databricks API. * Added taskValues support in remoteDbUtils ([#406](#406)). * Added more detailed error message on default credentials not found error ([#419](#419)). * Request management token via Azure CLI only for Service Principals and not human users ([#408](#408)). API Changes: * Fixed `create()` method for [w.functions](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/functions.html) workspace-level service and corresponding `databricks.sdk.service.catalog.CreateFunction` and `databricks.sdk.service.catalog.FunctionInfo` dataclasses. * Changed `create()` method for [w.metastores](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/metastores.html) workspace-level service with new required argument order. * Changed `storage_root` field for `databricks.sdk.service.catalog.CreateMetastore` to be optional. * Added `skip_validation` field for `databricks.sdk.service.catalog.UpdateExternalLocation`. * Added `libraries` field for `databricks.sdk.service.compute.CreatePolicy`, `databricks.sdk.service.compute.EditPolicy` and `databricks.sdk.service.compute.Policy`. * Added `init_scripts` field for `databricks.sdk.service.compute.EventDetails`. * Added `file` field for `databricks.sdk.service.compute.InitScriptInfo`. * Added `zone_id` field for `databricks.sdk.service.compute.InstancePoolGcpAttributes`. * Added several dataclasses related to init scripts. * Added `databricks.sdk.service.compute.LocalFileInfo` dataclass. * Replaced `ui_state` field with `edit_mode` for `databricks.sdk.service.jobs.CreateJob` and `databricks.sdk.service.jobs.JobSettings`. * Replaced `databricks.sdk.service.jobs.CreateJobUiState` dataclass with `databricks.sdk.service.jobs.CreateJobEditMode`. * Added `include_resolved_values` field for `databricks.sdk.service.jobs.GetRunRequest`. * Replaced `databricks.sdk.service.jobs.JobSettingsUiState` dataclass with `databricks.sdk.service.jobs.JobSettingsEditMode`. * Removed [a.o_auth_enrollment](https://databricks-sdk-py.readthedocs.io/en/latest/account/o_auth_enrollment.html) account-level service. This was only used to aid in OAuth enablement during the public preview of OAuth. OAuth is now enabled for all AWS E2 accounts, so usage of this API is no longer needed. * Added `network_connectivity_config_id` field for `databricks.sdk.service.provisioning.UpdateWorkspaceRequest`. * Added [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/network_connectivity.html) account-level service. * Added `string_shared_as` field for `databricks.sdk.service.sharing.SharedDataObject`. Internal changes: * Added regression question to issue template ([#414](#414)). * Made test_auth no longer fail if you have a default profile setup ([#426](#426)). OpenAPI SHA: d136ad0541f036372601bad9a4382db06c3c912d, Date: 2023-11-14

vsamoilov requested a review from mgyucht October 13, 2023 13:31

mgyucht reviewed Oct 23, 2023

View reviewed changes

vsamoilov requested a review from mgyucht October 23, 2023 13:52

gaborratky-db reviewed Oct 23, 2023

View reviewed changes

vsamoilov requested a review from gaborratky-db October 27, 2023 09:26

vsamoilov force-pushed the JOBS-11439 branch from 088fea1 to 57af9e1 Compare October 27, 2023 09:29

gaborratky-db suggested changes Oct 27, 2023

View reviewed changes

databricks/sdk/dbutils.py Outdated Show resolved Hide resolved

tests/test_dbutils.py Show resolved Hide resolved

mgyucht approved these changes Nov 1, 2023

View reviewed changes

mgyucht enabled auto-merge November 1, 2023 08:09

mgyucht disabled auto-merge November 1, 2023 08:09

vsamoilov added 4 commits November 1, 2023 10:01

Add taskValues support in remoteDbUtils

db85574

Remove checkTaskValue

92e0846

Change get behavior and comment

e16caf5

Fix comments

53b55cf

vsamoilov force-pushed the JOBS-11439 branch from a612e74 to 53b55cf Compare November 1, 2023 09:02

mgyucht added this pull request to the merge queue Nov 1, 2023

Merged via the queue into databricks:main with commit 9955c08 Nov 1, 2023
7 checks passed

mgyucht mentioned this pull request Nov 14, 2023

Release v0.13.0 #442

Merged

NodeJSmith mentioned this pull request Nov 15, 2023

RemoteDbUtils _ProxyCall does not support jobs.taskValues.set and jobs.taskValues.get #428

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JOBS-11439] Add taskValues support in remoteDbUtils #406

[JOBS-11439] Add taskValues support in remoteDbUtils #406

vsamoilov commented Oct 13, 2023

mgyucht left a comment

mgyucht Oct 23, 2023

vsamoilov Oct 23, 2023 •

edited

Loading

mgyucht Oct 23, 2023

vsamoilov Oct 23, 2023 •

edited

Loading

vsamoilov commented Oct 23, 2023

gaborratky-db commented Oct 23, 2023 •

edited

Loading

gaborratky-db left a comment

gaborratky-db Oct 23, 2023

vsamoilov Oct 23, 2023

gaborratky-db Oct 23, 2023

gaborratky-db Oct 23, 2023

gaborratky-db Oct 23, 2023

mgyucht commented Oct 25, 2023

vsamoilov commented Oct 25, 2023

gaborratky-db commented Oct 26, 2023

gaborratky-db commented Oct 26, 2023

codecov-commenter commented Oct 27, 2023 •

edited

Loading

gaborratky-db left a comment

gaborratky-db commented Oct 31, 2023

mgyucht left a comment

	assert dbutils.jobs.taskValues.checkTaskValue('key', 'value')
	dbutils.jobs.taskValues.assert('key', 'value')



		def test_jobs_task_values_check_task_value(dbutils):
		assert (dbutils.jobs.taskValues.checkTaskValue('key', 'value') == False)

-    assert (dbutils.jobs.taskValues.checkTaskValue('key', 'value') == False)
+    try:
+        dbutils.jobs.taskValues.assert('key', 'value')
+        // do something idiomatic here that makes the test fail
+    except AssertError:
+       // Swallow error as it's what we expected to happen

[JOBS-11439] Add taskValues support in remoteDbUtils #406

[JOBS-11439] Add taskValues support in remoteDbUtils #406

Conversation

vsamoilov commented Oct 13, 2023

Changes

Tests

mgyucht left a comment

Choose a reason for hiding this comment

mgyucht Oct 23, 2023

Choose a reason for hiding this comment

vsamoilov Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

mgyucht Oct 23, 2023

Choose a reason for hiding this comment

vsamoilov Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

vsamoilov commented Oct 23, 2023

gaborratky-db commented Oct 23, 2023 • edited Loading

gaborratky-db left a comment

Choose a reason for hiding this comment

gaborratky-db Oct 23, 2023

Choose a reason for hiding this comment

vsamoilov Oct 23, 2023

Choose a reason for hiding this comment

gaborratky-db Oct 23, 2023

Choose a reason for hiding this comment

gaborratky-db Oct 23, 2023

Choose a reason for hiding this comment

gaborratky-db Oct 23, 2023

Choose a reason for hiding this comment

mgyucht commented Oct 25, 2023

vsamoilov commented Oct 25, 2023

gaborratky-db commented Oct 26, 2023

gaborratky-db commented Oct 26, 2023

codecov-commenter commented Oct 27, 2023 • edited Loading

Codecov Report

gaborratky-db left a comment

Choose a reason for hiding this comment

gaborratky-db commented Oct 31, 2023

mgyucht left a comment

Choose a reason for hiding this comment

vsamoilov Oct 23, 2023 •

edited

Loading

vsamoilov Oct 23, 2023 •

edited

Loading

gaborratky-db commented Oct 23, 2023 •

edited

Loading

codecov-commenter commented Oct 27, 2023 •

edited

Loading