Add support for non-Python ipynb notebooks to DABs #1827

shreyas-goenka · 2024-10-11T14:40:35Z

Changes

Background

The workspace import APIs recently added support for importing Jupyter notebooks written in R, Scala, or SQL, that is non-Python notebooks. This now works for the /import-file API which we leverage in the CLI.

Note: We do not need any changes in databricks sync. It works out of the box because any state mapping of local names to remote names that we store is only scoped to the notebook extension (i.e., .ipynb in this case) and is agnostic of the notebook's specific language.

Problem this PR addresses

The extension-aware filer previously did not function because it checks that a .ipynb notebook is written in Python. This PR relaxes that constraint and adds integration tests for both the normal workspace filer and extensions aware filer writing and reading non-Python .ipynb notebooks.

This implies that after this PR DABs in the workspace / CLI from DBR will work for non-Python notebooks as well. non-Python notebooks for DABs deployment from local machines already works after the platform side changes to the API landed, this PR just adds integration tests for that bit of functionality.

Note: Any platform side changes we needed for the import API have already been rolled out to production.

Before

DABs deploy would work fine for non-Python notebooks. But DABs deployments from DBR would not.

After

DABs deploys both from local machines and DBR will work fine.

Testing

For creating the .ipynb notebook fixtures used in the integration tests I created them directly from the VSCode UI. This ensures high fidelity with how users will create their non-Python notebooks locally. For Python notebooks this is supported out of the box by VSCode but for R and Scala notebooks this requires installing the Jupyter kernel for R and Scala on my local machine and using that from VSCode.

For SQL, I ended up directly modifying the language_info field in the Jupyter metadata to create the test fixture.

Discussion: Issues with configuring language at the cell level

The language metadata for a Jupyter notebook is standardized at the notebook level (in the language_info field). Unfortunately, it's not standardized at the cell level. Thus, for example, if a user changes the language for their cell in VSCode (which is supported by the standard Jupyter VSCode integration), it'll cause a runtime error when the user actually attempts to run the notebook. This is because the cell-level metadata is encoded in a format specific to VSCode:

cells: []{
    "vscode": {
     "languageId": "sql"
    }
}

Supporting cell level languages is thus out of scope for this PR and can be revisited along with the workspace files team if there's strong customer interest.

shreyas-goenka · 2024-10-21T20:23:45Z

triggered nightlies on this PR...

shreyas-goenka · 2024-10-22T08:31:05Z

nightlies are green

pietern

Thanks!

Please update the PR description to say this only affects the read path of the extension-aware filer, i.e. when the CLI runs on Databricks compute and reads from WSFS.

The comments about the cell-aware language settings don't apply to the PR and are more general commentary. You can keep it but delineate it with a header or so to indicate it's an appendix/commentary and doesn't apply to the change itself.

libs/notebook/ext.go

internal/filer_test.go

pietern · 2024-11-01T10:51:25Z

Comments on the description (nuances):

The workspace import APIs recently added support for importing notebooks written in R, Scala, or SQL, which are non-Python notebooks. This now work for the /import-file API which we leverage in the CLI.

The newly added support was for ipynb notebooks specifically.

Notebooks in R, Scala, or SQL were always importable if they were encoded in the source format.

libs/notebook/ext.go

libs/filer/workspace_files_extensions_client.go

pietern

Thanks, two minor things.

libs/notebook/ext.go

internal/filer_test.go

pietern

Integration tests are red.

github-actions · 2024-11-13T14:49:42Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

PR number: 1827
Commit SHA: be53fa5ff48adf94996bc5333ca2605e9d45760c

Checks will be approved automatically on success.

eng-dev-ecosystem-bot · 2024-11-13T14:50:12Z

Test Details: go/deco-tests/11819726261

Bundles: * Do not execute build on bundle destroy ([#1882](#1882)). * Add support for non-Python ipynb notebooks to DABs ([#1827](#1827)). API Changes: * Added `databricks credentials` command group. * Changed `databricks genie execute-message-query` command to type `databricks genie execute-message-query` command. * Changed `databricks lakeview create` command with new required argument order. * Added `databricks aibi-dashboard-embedding-access-policy` command group. * Added `databricks aibi-dashboard-embedding-approved-domains` command group. * Removed `databricks clean-rooms` command group. OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07) Dependency updates: * Upgrade TF provider to 1.58.0 ([#1900](#1900)). * Bump golang.org/x/sync from 0.8.0 to 0.9.0 ([#1892](#1892)). * Bump golang.org/x/text from 0.19.0 to 0.20.0 ([#1893](#1893)). * Bump golang.org/x/mod from 0.21.0 to 0.22.0 ([#1895](#1895)). * Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0 ([#1894](#1894)). * Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 ([#1878](#1878)).

Bundles: * Do not execute build on bundle destroy ([#1882](#1882)). * Add support for non-Python ipynb notebooks to DABs ([#1827](#1827)). API Changes: * Added `databricks credentials` command group. * Changed `databricks lakeview create` command with new required argument order. OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07) Dependency updates: * Upgrade TF provider to 1.58.0 ([#1900](#1900)). * Bump golang.org/x/sync from 0.8.0 to 0.9.0 ([#1892](#1892)). * Bump golang.org/x/text from 0.19.0 to 0.20.0 ([#1893](#1893)). * Bump golang.org/x/mod from 0.21.0 to 0.22.0 ([#1895](#1895)). * Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0 ([#1894](#1894)). * Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 ([#1878](#1878)).

[WIP] Add support for non python ipynb notebooks

f6232f5

shreyas-goenka added the do-not-merge label Oct 11, 2024

shreyas-goenka added 5 commits October 21, 2024 20:10

add cases for r notebooks

2eb6ea5

add tests for scala notebooks

cd8cc2c

add test for sql as well as enum for extensions

20a30f7

cleanup todos

2c7f41a

cleanup todos

4bce4f1

shreyas-goenka changed the title ~~[WIP] Add support for non python ipynb notebooks~~ Add support for non-python ipynb notebooks to DABs Oct 21, 2024

shreyas-goenka added 4 commits October 21, 2024 21:53

better name for test case

6ad24b5

improve tests

8683122

Merge remote-tracking branch 'origin' into support-non-python-ipynb

df7d5ca

-

dda138f

shreyas-goenka removed the do-not-merge label Oct 21, 2024

shreyas-goenka marked this pull request as ready for review October 21, 2024 20:17

shreyas-goenka requested review from pietern and andrewnester October 21, 2024 20:17

pietern reviewed Oct 31, 2024

View reviewed changes

libs/notebook/ext.go Outdated Show resolved Hide resolved

internal/filer_test.go Outdated Show resolved Hide resolved

internal/filer_test.go Show resolved Hide resolved

internal/filer_test.go Outdated Show resolved Hide resolved

shreyas-goenka added 2 commits October 31, 2024 11:21

Merge remote-tracking branch 'origin' into support-non-python-ipynb

865c613

remove extension type

7d5eca5

shreyas-goenka temporarily deployed to test-trigger-is October 31, 2024 10:26 — with GitHub Actions Inactive

-

9602785

shreyas-goenka temporarily deployed to test-trigger-is October 31, 2024 10:46 — with GitHub Actions Inactive

combine tests with and wo override flag

606c0c5

shreyas-goenka temporarily deployed to test-trigger-is October 31, 2024 12:57 — with GitHub Actions Inactive

remove second py notebook

faca8cc

shreyas-goenka temporarily deployed to test-trigger-is October 31, 2024 13:01 — with GitHub Actions Inactive

shreyas-goenka requested a review from pietern October 31, 2024 13:19

pietern reviewed Nov 1, 2024

View reviewed changes

libs/notebook/ext.go Outdated Show resolved Hide resolved

libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved

-

a7e5210

shreyas-goenka temporarily deployed to test-trigger-is November 1, 2024 11:16 — with GitHub Actions Inactive

pietern changed the title ~~Add support for non-python ipynb notebooks to DABs~~ Add support for non-Python ipynb notebooks to DABs Nov 8, 2024

Address comments

00eec5a

shreyas-goenka temporarily deployed to test-trigger-is November 12, 2024 17:32 — with GitHub Actions Inactive

shreyas-goenka temporarily deployed to test-trigger-is November 12, 2024 17:33 — with GitHub Actions Inactive

shreyas-goenka requested a review from pietern November 12, 2024 17:33

pietern approved these changes Nov 12, 2024

View reviewed changes

libs/notebook/ext.go Show resolved Hide resolved

internal/filer_test.go Outdated Show resolved Hide resolved

pietern requested changes Nov 12, 2024

View reviewed changes

pietern mentioned this pull request Nov 12, 2024

Fix workspace extensions filer accidentally reading notebooks #1891

Merged

address comments and fix integration tests

be53fa5

shreyas-goenka temporarily deployed to test-trigger-is November 13, 2024 14:49 — with GitHub Actions Inactive

pietern approved these changes Nov 13, 2024

View reviewed changes

shreyas-goenka added this pull request to the merge queue Nov 13, 2024

Merged via the queue into main with commit e1978fa Nov 13, 2024
10 checks passed

shreyas-goenka deleted the support-non-python-ipynb branch November 13, 2024 21:46

andrewnester mentioned this pull request Nov 14, 2024

[Release] Release v0.234.0 #1902

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non-Python ipynb notebooks to DABs #1827

Add support for non-Python ipynb notebooks to DABs #1827

shreyas-goenka commented Oct 11, 2024 •

edited by pietern

Loading

shreyas-goenka commented Oct 21, 2024

shreyas-goenka commented Oct 22, 2024

pietern left a comment

pietern commented Nov 1, 2024

pietern left a comment

pietern left a comment

github-actions bot commented Nov 13, 2024

eng-dev-ecosystem-bot commented Nov 13, 2024

Add support for non-Python ipynb notebooks to DABs #1827

Add support for non-Python ipynb notebooks to DABs #1827

Conversation

shreyas-goenka commented Oct 11, 2024 • edited by pietern Loading

Changes

Background

Problem this PR addresses

Before

After

Testing

Discussion: Issues with configuring language at the cell level

shreyas-goenka commented Oct 21, 2024

shreyas-goenka commented Oct 22, 2024

pietern left a comment

Choose a reason for hiding this comment

pietern commented Nov 1, 2024

pietern left a comment

Choose a reason for hiding this comment

pietern left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 13, 2024

eng-dev-ecosystem-bot commented Nov 13, 2024

shreyas-goenka commented Oct 11, 2024 •

edited by pietern

Loading