Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "bundle init" when run from Databricks #1744

Closed
wants to merge 14 commits into from
6 changes: 2 additions & 4 deletions bundle/config/mutator/configure_wsfs.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@ import (

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/libs/diag"
"github.com/databricks/cli/libs/env"
"github.com/databricks/cli/libs/filer"
"github.com/databricks/cli/libs/runtime"
"github.com/databricks/cli/libs/vfs"
)

const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION"

type configureWSFS struct{}

func ConfigureWSFS() bundle.Mutator {
Expand All @@ -32,7 +30,7 @@ func (m *configureWSFS) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagno
}

// The executable must be running on DBR.
if _, ok := env.Lookup(ctx, envDatabricksRuntimeVersion); !ok {
if !runtime.RunsOnDatabricks(ctx) {
return nil
}

Expand Down
34 changes: 34 additions & 0 deletions libs/notebook/detect.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,37 @@ func Detect(name string) (notebook bool, language workspace.Language, err error)
b := filepath.Base(name)
return DetectWithFS(os.DirFS(d), b)
}

type inMemoryFile struct {
buffer bytes.Buffer
}

type inMemoryFS struct {
content []byte
}

func (f *inMemoryFile) Close() error {
return nil
}

func (f *inMemoryFile) Stat() (fs.FileInfo, error) {
return nil, nil
}

func (f *inMemoryFile) Read(b []byte) (n int, err error) {
return f.buffer.Read(b)
}

func (fs inMemoryFS) Open(name string) (fs.File, error) {
return &inMemoryFile{
buffer: *bytes.NewBuffer(fs.content),
}, nil
}

func DetectWithContent(name string, content []byte) (notebook bool, language workspace.Language, err error) {
fs := inMemoryFS{
content: content,
}

return DetectWithFS(fs, name)
}
19 changes: 19 additions & 0 deletions libs/notebook/detect_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,22 @@ func TestDetectWithObjectInfo(t *testing.T) {
assert.True(t, nb)
assert.Equal(t, workspace.LanguagePython, lang)
}

func TestInMemoryFiles(t *testing.T) {
isNotebook, language, err := DetectWithContent("hello.py", []byte("# Databricks notebook source\n print('hello')"))
assert.True(t, isNotebook)
assert.Equal(t, workspace.LanguagePython, language)
require.NoError(t, err)

isNotebook, language, err = DetectWithContent("hello.py", []byte("print('hello')"))
assert.False(t, isNotebook)
assert.Equal(t, workspace.Language(""), language)
require.NoError(t, err)

fileContent, err := os.ReadFile("./testdata/py_ipynb.ipynb")
require.NoError(t, err)
isNotebook, language, err = DetectWithContent("py_ipynb.ipynb", fileContent)
assert.True(t, isNotebook)
assert.Equal(t, workspace.LanguagePython, language)
require.NoError(t, err)
}
14 changes: 14 additions & 0 deletions libs/runtime/detect.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
package runtime

import (
"context"

"github.com/databricks/cli/libs/env"
)

const envDatabricksRuntimeVersion = "DATABRICKS_RUNTIME_VERSION"

func RunsOnDatabricks(ctx context.Context) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a common lib function. I'd expect this to also check for the existence of /Workspace though to avoid false positives. There may be all kinds of reasons why customers set the env var locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to test for /Workspace as it makes the code using it harder to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, testing is a pain :( We can leave it out, but then we should at least emit a debug log message whenever there's a match. There will be false positives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing is not an issue as long as you don't inline these expectations.

You can store a bool on the context that stores whether or not we're running on DBR (see bundle/context.go for inspiration). In tests, you mark it as always true or always false depending on what you want to test. For the real CLI run, you run a routine that performs the actual detection and stores it in the context.

For the check itself, it can look at /proc/mounts for the workspace fuse mount, it can check for /.fuse-mounts, it can check for /databricks, and I'm sure there are a couple other stable ways to determine this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this approach in #1889.

Besides the environment variable, it also checks for the presence of a /databricks directory.

value, ok := env.Lookup(ctx, envDatabricksRuntimeVersion)
return value != "" && ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need the value to be non-empty, then you don't need to check if it exists (the OK).

You can use _ for unused variables, or use env.Get directly:

cli/libs/env/context.go

Lines 51 to 56 in a4ba0bb

// Get key from the context or the environment.
// Context has precedence.
func Get(ctx context.Context, key string) string {
v, _ := Lookup(ctx, key)
return v
}

}
fjakobs marked this conversation as resolved.
Show resolved Hide resolved
18 changes: 18 additions & 0 deletions libs/runtime/detect_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package runtime

import (
"context"
"testing"

"github.com/stretchr/testify/assert"
)

func TestRunsOnDatabricks(t *testing.T) {
ctx := context.Background()

t.Setenv("DATABRICKS_RUNTIME_VERSION", "")
assert.False(t, RunsOnDatabricks(ctx))

t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3")
assert.True(t, RunsOnDatabricks(ctx))
}
53 changes: 48 additions & 5 deletions libs/template/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,19 @@ package template

import (
"context"
"encoding/base64"
"io"
"io/fs"
"os"
"path/filepath"
"strings"

"github.com/databricks/cli/cmd/root"
"github.com/databricks/cli/libs/filer"
"github.com/databricks/cli/libs/log"
"github.com/databricks/cli/libs/notebook"
"github.com/databricks/cli/libs/runtime"
"github.com/databricks/databricks-sdk-go/service/workspace"
)

// Interface representing a file to be materialized from a template into a project
Expand Down Expand Up @@ -68,16 +75,20 @@ func (f *copyFile) PersistToDisk() error {
return err
}
defer srcFile.Close()
dstFile, err := os.OpenFile(path, os.O_CREATE|os.O_EXCL|os.O_WRONLY, f.perm)

// we read the full file into memory because we need to inspect the content
// in order to determine if it is a notebook
// Once we stop using the workspace API, we can remove this and write in a streaming fashion
content, err := io.ReadAll(srcFile)
if err != nil {
return err
}
defer dstFile.Close()
_, err = io.Copy(dstFile, srcFile)
return err
return writeFile(f.ctx, path, content, f.perm)
}

type inMemoryFile struct {
pietern marked this conversation as resolved.
Show resolved Hide resolved
ctx context.Context

dstPath *destinationPath

content []byte
Expand All @@ -97,5 +108,37 @@ func (f *inMemoryFile) PersistToDisk() error {
if err != nil {
return err
}
return os.WriteFile(path, f.content, f.perm)

return writeFile(f.ctx, path, f.content, f.perm)
}

func shouldUseImportNotebook(ctx context.Context, path string, content []byte) bool {
if strings.HasPrefix(path, "/Workspace/") && runtime.RunsOnDatabricks(ctx) {
isNotebook, _, err := notebook.DetectWithContent(path, content)
if err != nil {
log.Debugf(ctx, "Error detecting notebook: %v", err)
}
return isNotebook && err == nil
}

return false
}

func writeFile(ctx context.Context, path string, content []byte, perm fs.FileMode) error {
if shouldUseImportNotebook(ctx, path, content) {
return importNotebook(ctx, path, content)
} else {
fjakobs marked this conversation as resolved.
Show resolved Hide resolved
return os.WriteFile(path, content, perm)
}
}

func importNotebook(ctx context.Context, path string, content []byte) error {
w := root.WorkspaceClient(ctx)

return w.Workspace.Import(ctx, workspace.Import{
Format: "AUTO",
Overwrite: false,
Path: path,
Content: base64.StdEncoding.EncodeToString(content),
})
}
40 changes: 40 additions & 0 deletions libs/template/file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ import (
"runtime"
"testing"

"github.com/databricks/databricks-sdk-go/experimental/mocks"
"github.com/databricks/databricks-sdk-go/service/workspace"
"github.com/stretchr/testify/mock"

"github.com/databricks/cli/cmd/root"
"github.com/databricks/cli/libs/filer"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
Expand All @@ -17,6 +22,7 @@ func testInMemoryFile(t *testing.T, perm fs.FileMode) {
tmpDir := t.TempDir()

f := &inMemoryFile{
ctx: context.Background(),
dstPath: &destinationPath{
root: tmpDir,
relPath: "a/b/c",
Expand Down Expand Up @@ -109,3 +115,37 @@ func TestTemplateCopyFilePersistToDiskForWindows(t *testing.T) {
// fs.FileMode values we can use for different operating systems.
testCopyFile(t, 0666)
}

func TestShouldUseImportNotebook(t *testing.T) {
ctx := context.Background()
data := []byte("# Databricks notebook source\n print('hello')")

assert.False(t, shouldUseImportNotebook(ctx, "./foo/bar", data))
assert.False(t, shouldUseImportNotebook(ctx, "./foo/bar.ipynb", data))
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data))
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.ipynb", data))

t.Setenv("DATABRICKS_RUNTIME_VERSION", "14.3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use env.Set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference? I see t.Setenv used in tests. If I change it to env.Set then my tests break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With env.Set it adds the variable to the context and passes it along in the context.

If it broke then either 1) you weren't capturing the returned ctx, or 2) the function doesn't use env.Get to access the environment variables.

assert.False(t, shouldUseImportNotebook(ctx, "./foo/bar", data))
assert.False(t, shouldUseImportNotebook(ctx, "./foo/bar.ipynb", data))
assert.False(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar", data))
assert.True(t, shouldUseImportNotebook(ctx, "/Workspace/foo/bar.py", data))
}

func TestImportNotebook(t *testing.T) {
ctx := context.Background()

m := mocks.NewMockWorkspaceClient(t)
ctx = root.SetWorkspaceClient(ctx, m.WorkspaceClient)

workspaceApi := m.GetMockWorkspaceAPI()
workspaceApi.EXPECT().Import(mock.Anything, workspace.Import{
Content: "cXdlcnR5", // base64 of "qwerty"
Format: "AUTO",
Overwrite: false,
Path: "/Workspace/foo/bar.ipynb",
}).Return(nil)

err := importNotebook(ctx, "/Workspace/foo/bar.ipynb", []byte("qwerty"))
assert.NoError(t, err)
}
13 changes: 10 additions & 3 deletions libs/template/renderer.go
Original file line number Diff line number Diff line change
Expand Up @@ -153,12 +153,18 @@ func (r *renderer) computeFile(relPathTemplate string) (file, error) {
return nil, err
}

// we need the absolute path in case we need to write notebooks using the REST API
rootPath, err := filepath.Abs(r.instanceRoot)
if err != nil {
return nil, err
}

fjakobs marked this conversation as resolved.
Show resolved Hide resolved
// If file name does not specify the `.tmpl` extension, then it is copied
// over as is, without treating it as a template
if !strings.HasSuffix(relPathTemplate, templateExtension) {
return &copyFile{
dstPath: &destinationPath{
root: r.instanceRoot,
root: rootPath,
relPath: relPath,
},
perm: perm,
Expand Down Expand Up @@ -194,8 +200,9 @@ func (r *renderer) computeFile(relPathTemplate string) (file, error) {
}

return &inMemoryFile{
ctx: r.ctx,
dstPath: &destinationPath{
root: r.instanceRoot,
root: rootPath,
relPath: relPath,
},
perm: perm,
Expand Down Expand Up @@ -314,7 +321,7 @@ func (r *renderer) persistToDisk() error {
if err == nil {
return fmt.Errorf("failed to initialize template, one or more files already exist: %s", path)
}
if err != nil && !errors.Is(err, fs.ErrNotExist) {
if !errors.Is(err, fs.ErrNotExist) {
return fmt.Errorf("error while verifying file %s does not already exist: %w", path, err)
}
}
Expand Down
4 changes: 4 additions & 0 deletions libs/template/renderer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ func TestRendererPersistToDisk(t *testing.T) {
skipPatterns: []string{"a/b/c", "mn*"},
files: []file{
&inMemoryFile{
ctx: ctx,
dstPath: &destinationPath{
root: tmpDir,
relPath: "a/b/c",
Expand All @@ -337,6 +338,7 @@ func TestRendererPersistToDisk(t *testing.T) {
content: nil,
},
&inMemoryFile{
ctx: ctx,
dstPath: &destinationPath{
root: tmpDir,
relPath: "mno",
Expand All @@ -345,6 +347,7 @@ func TestRendererPersistToDisk(t *testing.T) {
content: nil,
},
&inMemoryFile{
ctx: ctx,
dstPath: &destinationPath{
root: tmpDir,
relPath: "a/b/d",
Expand All @@ -353,6 +356,7 @@ func TestRendererPersistToDisk(t *testing.T) {
content: []byte("123"),
},
&inMemoryFile{
ctx: ctx,
dstPath: &destinationPath{
root: tmpDir,
relPath: "mmnn",
Expand Down
Loading