Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-Python ipynb notebooks to DABs #1827

Merged
merged 18 commits into from
Nov 13, 2024

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Oct 11, 2024

Changes

Background

The workspace import APIs recently added support for importing Jupyter notebooks written in R, Scala, or SQL, that is non-Python notebooks. This now works for the /import-file API which we leverage in the CLI.

Note: We do not need any changes in databricks sync. It works out of the box because any state mapping of local names to remote names that we store is only scoped to the notebook extension (i.e., .ipynb in this case) and is agnostic of the notebook's specific language.

Problem this PR addresses

The extension-aware filer previously did not function because it checks that a .ipynb notebook is written in Python. This PR relaxes that constraint and adds integration tests for both the normal workspace filer and extensions aware filer writing and reading non-Python .ipynb notebooks.

This implies that after this PR DABs in the workspace / CLI from DBR will work for non-Python notebooks as well. non-Python notebooks for DABs deployment from local machines already works after the platform side changes to the API landed, this PR just adds integration tests for that bit of functionality.

Note: Any platform side changes we needed for the import API have already been rolled out to production.

Before

DABs deploy would work fine for non-Python notebooks. But DABs deployments from DBR would not.

After

DABs deploys both from local machines and DBR will work fine.

Testing

For creating the .ipynb notebook fixtures used in the integration tests I created them directly from the VSCode UI. This ensures high fidelity with how users will create their non-Python notebooks locally. For Python notebooks this is supported out of the box by VSCode but for R and Scala notebooks this requires installing the Jupyter kernel for R and Scala on my local machine and using that from VSCode.

For SQL, I ended up directly modifying the language_info field in the Jupyter metadata to create the test fixture.

Discussion: Issues with configuring language at the cell level

The language metadata for a Jupyter notebook is standardized at the notebook level (in the language_info field). Unfortunately, it's not standardized at the cell level. Thus, for example, if a user changes the language for their cell in VSCode (which is supported by the standard Jupyter VSCode integration), it'll cause a runtime error when the user actually attempts to run the notebook. This is because the cell-level metadata is encoded in a format specific to VSCode:

cells: []{
    "vscode": {
     "languageId": "sql"
    }
}

Supporting cell level languages is thus out of scope for this PR and can be revisited along with the workspace files team if there's strong customer interest.

@shreyas-goenka shreyas-goenka changed the title [WIP] Add support for non python ipynb notebooks Add support for non-python ipynb notebooks to DABs Oct 21, 2024
@shreyas-goenka shreyas-goenka marked this pull request as ready for review October 21, 2024 20:17
@shreyas-goenka
Copy link
Contributor Author

triggered nightlies on this PR...

@shreyas-goenka
Copy link
Contributor Author

nightlies are green

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Please update the PR description to say this only affects the read path of the extension-aware filer, i.e. when the CLI runs on Databricks compute and reads from WSFS.

The comments about the cell-aware language settings don't apply to the PR and are more general commentary. You can keep it but delineate it with a header or so to indicate it's an appendix/commentary and doesn't apply to the change itself.

libs/notebook/ext.go Outdated Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
internal/filer_test.go Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
@pietern
Copy link
Contributor

pietern commented Nov 1, 2024

Comments on the description (nuances):

The workspace import APIs recently added support for importing notebooks written in R, Scala, or SQL, which are non-Python notebooks. This now work for the /import-file API which we leverage in the CLI.

The newly added support was for ipynb notebooks specifically.

Notebooks in R, Scala, or SQL were always importable if they were encoded in the source format.

libs/notebook/ext.go Outdated Show resolved Hide resolved
libs/filer/workspace_files_extensions_client.go Outdated Show resolved Hide resolved
@pietern pietern changed the title Add support for non-python ipynb notebooks to DABs Add support for non-Python ipynb notebooks to DABs Nov 8, 2024
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, two minor things.

libs/notebook/ext.go Show resolved Hide resolved
internal/filer_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests are red.

Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 1827
  • Commit SHA: be53fa5ff48adf94996bc5333ca2605e9d45760c

Checks will be approved automatically on success.

@eng-dev-ecosystem-bot
Copy link
Collaborator

Test Details: go/deco-tests/11819726261

@shreyas-goenka shreyas-goenka added this pull request to the merge queue Nov 13, 2024
Merged via the queue into main with commit e1978fa Nov 13, 2024
10 checks passed
@shreyas-goenka shreyas-goenka deleted the support-non-python-ipynb branch November 13, 2024 21:46
andrewnester added a commit that referenced this pull request Nov 14, 2024
Bundles:
 * Do not execute build on bundle destroy ([#1882](#1882)).
 * Add support for non-Python ipynb notebooks to DABs ([#1827](#1827)).

API Changes:
 * Added `databricks credentials` command group.
 * Changed `databricks genie execute-message-query` command to type `databricks genie execute-message-query` command.
 * Changed `databricks lakeview create` command with new required argument order.
 * Added `databricks aibi-dashboard-embedding-access-policy` command group.
 * Added `databricks aibi-dashboard-embedding-approved-domains` command group.
 * Removed `databricks clean-rooms` command group.

OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07)
Dependency updates:
 * Upgrade TF provider to 1.58.0 ([#1900](#1900)).
 * Bump golang.org/x/sync from 0.8.0 to 0.9.0 ([#1892](#1892)).
 * Bump golang.org/x/text from 0.19.0 to 0.20.0 ([#1893](#1893)).
 * Bump golang.org/x/mod from 0.21.0 to 0.22.0 ([#1895](#1895)).
 * Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0 ([#1894](#1894)).
 * Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0 ([#1878](#1878)).
github-merge-queue bot pushed a commit that referenced this pull request Nov 14, 2024
Bundles:
* Do not execute build on bundle destroy
([#1882](#1882)).
* Add support for non-Python ipynb notebooks to DABs
([#1827](#1827)).

API Changes:
 * Added `databricks credentials` command group.
* Changed `databricks lakeview create` command with new required
argument order.

OpenAPI commit d25296d2f4aa7bd6195c816fdf82e0f960f775da (2024-11-07)
Dependency updates:
* Upgrade TF provider to 1.58.0
([#1900](#1900)).
* Bump golang.org/x/sync from 0.8.0 to 0.9.0
([#1892](#1892)).
* Bump golang.org/x/text from 0.19.0 to 0.20.0
([#1893](#1893)).
* Bump golang.org/x/mod from 0.21.0 to 0.22.0
([#1895](#1895)).
* Bump golang.org/x/oauth2 from 0.23.0 to 0.24.0
([#1894](#1894)).
* Bump github.com/databricks/databricks-sdk-go from 0.49.0 to 0.51.0
([#1878](#1878)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants