Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Files API client: recover on download failures (#844) #845

Conversation

ksafonov-db
Copy link
Contributor

What changes are proposed in this pull request?

  1. Extending Files API client to support resuming download on failures. New implementation tracks current offset in the input stream and issues a new download request from this point in case of an error.
  2. New code path is enabled by 'DATABRICKS_ENABLE_EXPERIMENTAL_FILES_API_CLIENT' config parameter.

How is this tested?

Added unit tests for the new code path:
% python3 -m pytest tests/test_files.py

@renaudhartert-db renaudhartert-db changed the title Files API client: recover on download failures (#844) [Feature] Files API client: recover on download failures (#844) Jan 2, 2025
@renaudhartert-db renaudhartert-db self-requested a review January 2, 2025 12:58
@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/12582804714

Copy link
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ksafonov-db ksafonov-db force-pushed the files-api-recover-on-download-failures branch from 644ec16 to 0ec5120 Compare January 8, 2025 13:31
@ksafonov-db ksafonov-db force-pushed the files-api-recover-on-download-failures branch from 0ec5120 to 7b8116f Compare January 8, 2025 14:07
@ksafonov-db ksafonov-db force-pushed the files-api-recover-on-download-failures branch from 7b8116f to 66c8ca6 Compare January 8, 2025 14:23
@ksafonov-db ksafonov-db force-pushed the files-api-recover-on-download-failures branch from 66c8ca6 to c7e8885 Compare January 8, 2025 14:24
@ksafonov-db ksafonov-db force-pushed the files-api-recover-on-download-failures branch from c7e8885 to 66c8ca6 Compare January 8, 2025 14:26
Copy link

github-actions bot commented Jan 8, 2025

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 845
  • Commit SHA: 19d2f78da80c15baac4209c459f1bf4d78362dde

Checks will be approved automatically on success.

@renaudhartert-db renaudhartert-db added this pull request to the merge queue Jan 8, 2025
Merged via the queue into databricks:main with commit d907c0c Jan 8, 2025
19 checks passed
renaudhartert-db added a commit that referenced this pull request Jan 20, 2025
### New Features and Improvements

 * Add `serving.http_request` to call external functions. ([#857](#857)).
 * Files API client: recover on download failures ([#844](#844)) ([#845](#845)).

### Bug Fixes

 * Properly pass query parameters in apps and oauth2 ([#862](#862)).

### Internal Changes

 * Add unit tests for external-browser authentication ([#863](#863)).
 * Decouple oauth2 and serving  ([#855](#855)).
 * Migrate workflows that need write access to use hosted runners ([#850](#850)).
 * Stop testing Python 3.7 on Ubuntu ([#858](#858)).

### API Changes:

 * Added [w.access_control](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/access_control.html) workspace-level service.
 * Added `http_request()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service.
 * Added `no_compute` field for `databricks.sdk.service.apps.CreateAppRequest`.
 * Added `has_more` field for `databricks.sdk.service.jobs.BaseJob`.
 * Added `has_more` field for `databricks.sdk.service.jobs.BaseRun`.
 * Added `page_token` field for `databricks.sdk.service.jobs.GetJobRequest`.
 * Added `has_more` and `next_page_token` fields for `databricks.sdk.service.jobs.Job`.
 * Added `has_more` field for `databricks.sdk.service.jobs.Run`.
 * Added `clean_rooms_notebook_output` field for `databricks.sdk.service.jobs.RunOutput`.
 * Added `scopes` field for `databricks.sdk.service.oauth2.UpdateCustomAppIntegration`.
 * Added `run_as` field for `databricks.sdk.service.pipelines.CreatePipeline`.
 * Added `run_as` field for `databricks.sdk.service.pipelines.EditPipeline`.
 * Added `authorization_details` and `endpoint_url` fields for `databricks.sdk.service.serving.DataPlaneInfo`.
 * Added `contents` field for `databricks.sdk.service.serving.GetOpenApiResponse`.
 * Added `activated`, `activation_url`, `authentication_type`, `cloud`, `comment`, `created_at`, `created_by`, `data_recipient_global_metastore_id`, `ip_access_list`, `metastore_id`, `name`, `owner`, `properties_kvpairs`, `region`, `sharing_code`, `tokens`, `updated_at` and `updated_by` fields for `databricks.sdk.service.sharing.RecipientInfo`.
 * Added `expiration_time` field for `databricks.sdk.service.sharing.RecipientInfo`.
 * Added .
 * Added .
 * Added , ,  and .
 * Added .
 * Added , , ,  and .
 * Changed `update()` method for [a.account_federation_policy](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_federation_policy.html) account-level service with new required argument order.
 * Changed `update()` method for [a.service_principal_federation_policy](https://databricks-sdk-py.readthedocs.io/en/latest/account/service_principal_federation_policy.html) account-level service with new required argument order.
 * Changed `update()` method for [w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html) workspace-level service to return `databricks.sdk.service.sharing.RecipientInfo` dataclass.
 * Changed `update()` method for [w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html) workspace-level service return type to become non-empty.
 * Changed `update()` method for [w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html) workspace-level service to type `update()` method for [w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html) workspace-level service.
 * Changed `get_open_api()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service return type to become non-empty.
 * Changed `patch()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service to type `patch()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service.
 * Changed `patch()` method for [w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html) workspace-level service to return `databricks.sdk.service.serving.EndpointTags` dataclass.
 * Changed `databricks.sdk.service.serving.EndpointTagList` dataclass to.
 * Changed `collaborator_alias` field for `databricks.sdk.service.cleanrooms.CleanRoomCollaborator` to be required.
 * Changed `collaborator_alias` field for `databricks.sdk.service.cleanrooms.CleanRoomCollaborator` to be required.
 * Changed `update_mask` field for `databricks.sdk.service.oauth2.UpdateAccountFederationPolicyRequest` to no longer be required.
 * Changed `update_mask` field for `databricks.sdk.service.oauth2.UpdateServicePrincipalFederationPolicyRequest` to no longer be required.
 * Changed `days_of_week` field for `databricks.sdk.service.pipelines.RestartWindow` to type `databricks.sdk.service.pipelines.DayOfWeekList` dataclass.
 * Changed `behavior` field for `databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior` to no longer be required.
 * Changed `behavior` field for `databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior` to no longer be required.
 * Changed `project_id` and `region` fields for `databricks.sdk.service.serving.GoogleCloudVertexAiConfig` to be required.
 * Changed `project_id` and `region` fields for `databricks.sdk.service.serving.GoogleCloudVertexAiConfig` to be required.
 * Changed `workload_type` field for `databricks.sdk.service.serving.ServedEntityInput` to type `databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.
 * Changed `workload_type` field for `databricks.sdk.service.serving.ServedEntityOutput` to type `databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.
 * Changed `workload_type` field for `databricks.sdk.service.serving.ServedModelOutput` to type `databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.
 * Changed .
 * Changed .

OpenAPI SHA: 58905570a9928fc9ed31fba14a2edaf9a7c55b08, Date: 2025-01-20
github-merge-queue bot pushed a commit that referenced this pull request Jan 20, 2025
### New Features and Improvements

* Add `serving.http_request` to call external functions.
([#857](#857)).
* Files API client: recover on download failures
([#844](#844))
([#845](#845)).


### Bug Fixes

* Properly pass query parameters in apps and oauth2
([#862](#862)).


### Internal Changes

* Add unit tests for external-browser authentication
([#863](#863)).
* Decouple oauth2 and serving
([#855](#855)).
* Migrate workflows that need write access to use hosted runners
([#850](#850)).
* Stop testing Python 3.7 on Ubuntu
([#858](#858)).


### API Changes:

* Added
[w.access_control](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/access_control.html)
workspace-level service.
* Added `http_request()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service.
* Added `no_compute` field for
`databricks.sdk.service.apps.CreateAppRequest`.
 * Added `has_more` field for `databricks.sdk.service.jobs.BaseJob`.
 * Added `has_more` field for `databricks.sdk.service.jobs.BaseRun`.
* Added `page_token` field for
`databricks.sdk.service.jobs.GetJobRequest`.
* Added `has_more` and `next_page_token` fields for
`databricks.sdk.service.jobs.Job`.
 * Added `has_more` field for `databricks.sdk.service.jobs.Run`.
* Added `clean_rooms_notebook_output` field for
`databricks.sdk.service.jobs.RunOutput`.
* Added `scopes` field for
`databricks.sdk.service.oauth2.UpdateCustomAppIntegration`.
* Added `run_as` field for
`databricks.sdk.service.pipelines.CreatePipeline`.
* Added `run_as` field for
`databricks.sdk.service.pipelines.EditPipeline`.
* Added `authorization_details` and `endpoint_url` fields for
`databricks.sdk.service.serving.DataPlaneInfo`.
* Added `contents` field for
`databricks.sdk.service.serving.GetOpenApiResponse`.
* Added `activated`, `activation_url`, `authentication_type`, `cloud`,
`comment`, `created_at`, `created_by`,
`data_recipient_global_metastore_id`, `ip_access_list`, `metastore_id`,
`name`, `owner`, `properties_kvpairs`, `region`, `sharing_code`,
`tokens`, `updated_at` and `updated_by` fields for
`databricks.sdk.service.sharing.RecipientInfo`.
* Added `expiration_time` field for
`databricks.sdk.service.sharing.RecipientInfo`.
* Changed `update()` method for
[a.account_federation_policy](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_federation_policy.html)
account-level service with new required argument order.
* Changed `update()` method for
[a.service_principal_federation_policy](https://databricks-sdk-py.readthedocs.io/en/latest/account/service_principal_federation_policy.html)
account-level service with new required argument order.
* Changed `update()` method for
[w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html)
workspace-level service to return
`databricks.sdk.service.sharing.RecipientInfo` dataclass.
* Changed `update()` method for
[w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html)
workspace-level service return type to become non-empty.
* Changed `update()` method for
[w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html)
workspace-level service to type `update()` method for
[w.recipients](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/recipients.html)
workspace-level service.
* Changed `get_open_api()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service return type to become non-empty.
* Changed `patch()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service to type `patch()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service.
* Changed `patch()` method for
[w.serving_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving_endpoints.html)
workspace-level service to return
`databricks.sdk.service.serving.EndpointTags` dataclass.
* Changed `databricks.sdk.service.serving.EndpointTagList` dataclass to.
* Changed `collaborator_alias` field for
`databricks.sdk.service.cleanrooms.CleanRoomCollaborator` to be
required.
* Changed `collaborator_alias` field for
`databricks.sdk.service.cleanrooms.CleanRoomCollaborator` to be
required.
* Changed `update_mask` field for
`databricks.sdk.service.oauth2.UpdateAccountFederationPolicyRequest` to
no longer be required.
* Changed `update_mask` field for
`databricks.sdk.service.oauth2.UpdateServicePrincipalFederationPolicyRequest`
to no longer be required.
* Changed `days_of_week` field for
`databricks.sdk.service.pipelines.RestartWindow` to type
`databricks.sdk.service.pipelines.DayOfWeekList` dataclass.
* Changed `behavior` field for
`databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior` to no
longer be required.
* Changed `behavior` field for
`databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior` to no
longer be required.
* Changed `project_id` and `region` fields for
`databricks.sdk.service.serving.GoogleCloudVertexAiConfig` to be
required.
* Changed `project_id` and `region` fields for
`databricks.sdk.service.serving.GoogleCloudVertexAiConfig` to be
required.
* Changed `workload_type` field for
`databricks.sdk.service.serving.ServedEntityInput` to type
`databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.
* Changed `workload_type` field for
`databricks.sdk.service.serving.ServedEntityOutput` to type
`databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.
* Changed `workload_type` field for
`databricks.sdk.service.serving.ServedModelOutput` to type
`databricks.sdk.service.serving.ServingModelWorkloadType` dataclass.

OpenAPI SHA: 58905570a9928fc9ed31fba14a2edaf9a7c55b08, Date: 2025-01-20

---------

Signed-off-by: Renaud Hartert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants