-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-Python ipynb notebooks to DABs #1827
Conversation
triggered nightlies on this PR... |
nightlies are green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Please update the PR description to say this only affects the read path of the extension-aware filer, i.e. when the CLI runs on Databricks compute and reads from WSFS.
The comments about the cell-aware language settings don't apply to the PR and are more general commentary. You can keep it but delineate it with a header or so to indicate it's an appendix/commentary and doesn't apply to the change itself.
Comments on the description (nuances):
The newly added support was for ipynb notebooks specifically. Notebooks in R, Scala, or SQL were always importable if they were encoded in the source format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, two minor things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration tests are red.
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Test Details: go/deco-tests/11819726261 |
Bundles: * Do not execute build on bundle destroy ([#1882](#1882)). * Add support for non-Python ipynb notebooks to DABs ([#1827](#1827)). API Changes: * Added `databricks credentials` command group. * Changed `databricks genie execute-message-query` command to type `databricks genie execute-message-query` command. * Changed `databricks lakeview create` command with new required argument order. * Added `databricks aibi-dashboard-embedding-access-policy` command group. * Added `databricks aibi-dashboard-embedding-approved-domains` command group. * Removed `databricks clean-rooms` command group. OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07) Dependency updates: * Upgrade TF provider to 1.58.0 ([#1900](#1900)). * Bump golang.org/x/sync from 0.8.0 to 0.9.0 ([#1892](#1892)). * Bump golang.org/x/text from 0.19.0 to 0.20.0 ([#1893](#1893)). * Bump golang.org/x/mod from 0.21.0 to 0.22.0 ([#1895](#1895)). * Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0 ([#1894](#1894)). * Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 ([#1878](#1878)).
Bundles: * Do not execute build on bundle destroy ([#1882](#1882)). * Add support for non-Python ipynb notebooks to DABs ([#1827](#1827)). API Changes: * Added `databricks credentials` command group. * Changed `databricks lakeview create` command with new required argument order. OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07) Dependency updates: * Upgrade TF provider to 1.58.0 ([#1900](#1900)). * Bump golang.org/x/sync from 0.8.0 to 0.9.0 ([#1892](#1892)). * Bump golang.org/x/text from 0.19.0 to 0.20.0 ([#1893](#1893)). * Bump golang.org/x/mod from 0.21.0 to 0.22.0 ([#1895](#1895)). * Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0 ([#1894](#1894)). * Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 ([#1878](#1878)).
Changes
Background
The workspace import APIs recently added support for importing Jupyter notebooks written in R, Scala, or SQL, that is non-Python notebooks. This now works for the
/import-file
API which we leverage in the CLI.Note: We do not need any changes in
databricks sync
. It works out of the box because any state mapping of local names to remote names that we store is only scoped to the notebook extension (i.e.,.ipynb
in this case) and is agnostic of the notebook's specific language.Problem this PR addresses
The extension-aware filer previously did not function because it checks that a
.ipynb
notebook is written in Python. This PR relaxes that constraint and adds integration tests for both the normal workspace filer and extensions aware filer writing and reading non-Python.ipynb
notebooks.This implies that after this PR DABs in the workspace / CLI from DBR will work for non-Python notebooks as well. non-Python notebooks for DABs deployment from local machines already works after the platform side changes to the API landed, this PR just adds integration tests for that bit of functionality.
Note: Any platform side changes we needed for the import API have already been rolled out to production.
Before
DABs deploy would work fine for non-Python notebooks. But DABs deployments from DBR would not.
After
DABs deploys both from local machines and DBR will work fine.
Testing
For creating the
.ipynb
notebook fixtures used in the integration tests I created them directly from the VSCode UI. This ensures high fidelity with how users will create their non-Python notebooks locally. For Python notebooks this is supported out of the box by VSCode but for R and Scala notebooks this requires installing the Jupyter kernel for R and Scala on my local machine and using that from VSCode.For SQL, I ended up directly modifying the
language_info
field in the Jupyter metadata to create the test fixture.Discussion: Issues with configuring language at the cell level
The language metadata for a Jupyter notebook is standardized at the notebook level (in the
language_info
field). Unfortunately, it's not standardized at the cell level. Thus, for example, if a user changes the language for their cell in VSCode (which is supported by the standard Jupyter VSCode integration), it'll cause a runtime error when the user actually attempts to run the notebook. This is because the cell-level metadata is encoded in a format specific to VSCode:Supporting cell level languages is thus out of scope for this PR and can be revisited along with the workspace files team if there's strong customer interest.