Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for creating all-purpose clusters #1698

Merged
merged 9 commits into from
Sep 23, 2024

Conversation

andrewnester
Copy link
Contributor

@andrewnester andrewnester commented Aug 20, 2024

Changes

Added support for creating all-purpose clusters

Example of configuration

bundle:
  name: clusters

resources:
  clusters:
    test_cluster:
      cluster_name: "Test Cluster"
      num_workers: 2
      node_type_id: "i3.xlarge"
      autoscale:
        min_workers: 2
        max_workers: 7
      spark_version: "13.3.x-scala2.12"
      spark_conf:
        "spark.executor.memory": "2g"

  jobs:
    test_job:
      name: "Test Job"
      tasks:
        - task_key: test_task
          existing_cluster_id: ${resources.clusters.test_cluster.id}
          notebook_task:
            notebook_path: "./src/test.py"

targets:
    development:
      mode: development
      compute_id: ${resources.clusters.test_cluster.id}

Tests

Added unit, config and E2E tests

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a cluster started if it is deployed?

This is not strictly necessary for bundle deployments because it will be started by jobs that use it. If it is started, deploy may take a long time.

bundle/config/mutator/apply_presets.go Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can folks use this with:

bundle:
  compute_id: ${resources.clusters.my_cluster.id}

Also, I'd love for this PR to also rename both the flag and the field to cluster_id.

@andrewnester
Copy link
Contributor Author

@pietern it is started by default so indeed deployment takes longer in this case

@andrewnester
Copy link
Contributor Author

How can folks use this with:
bundle:
compute_id: ${resources.clusters.my_cluster.id}

You can do this in target overrides but I'd better remove it from example because it is confusing, this is just to illustrate that reference string can be used for compute id

@pietern
Copy link
Contributor

pietern commented Aug 20, 2024

@andrewnester Is there a way we can avoid starting it (or waiting for it to start)?

@andrewnester
Copy link
Contributor Author

@pietern unfortunately we can't as TF provider and Go SDK explicitly wait for cluster to be running
https://github.com/databricks/terraform-provider-databricks/blob/main/clusters/resource_cluster.go#L413C2-L420

@andrewnester andrewnester requested a review from pietern August 20, 2024 14:12
@lennartkats-db
Copy link
Contributor

@andrewnester Waiting for it to start indeed seems pretty unfortunate, especially for development ... Does that happen every time, or just when it's first created?

@lennartkats-db
Copy link
Contributor

lennartkats-db commented Aug 26, 2024

You can do this in target overrides but I'd better remove it from example because it is confusing, this is just to illustrate that reference string can be used for compute id

But that would work, right? Setting bundle.cluster_id? Otherwise these all-purpose clusters are quite hard to use. You can't use a regular override to override a job cluster (new_cluster) with an all-purpose cluster (existing_cluster_id)

[edit]I opened a thread on this below[/edit]

@pietern
Copy link
Contributor

pietern commented Aug 28, 2024

Adding the label because we need the TF-side change to be released first.

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Left a review on the TF PR, we can get this in when that is merged, the provider is released, and we have bumped the provider here.

bundle/config/mutator/apply_presets.go Outdated Show resolved Hide resolved
}
}
return v, dyn.ErrSkip
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, this can be a function under libs/dyn (no need to include here).

github-merge-queue bot pushed a commit to databricks/terraform-provider-databricks that referenced this pull request Sep 2, 2024
…t on cluster creation (#3953)

## Changes
Added no_wait option for clusters to skip waiting to start on cluster
creation

Fixes
#3837

Also required for databricks/cli#1698

## Tests
- [x] `make test` run locally
- [x] relevant change in `docs/` folder
- [x] covered with integration tests in `internal/acceptance`
- [x] relevant acceptance tests are passing
- [x] using Go SDK


Manually run the following configuration to confirm everything works

```
data "databricks_node_type" "smallest" {
  local_disk = true
}

data "databricks_spark_version" "latest_lts" {
  long_term_support = true
}

resource "databricks_cluster" "shared_autoscaling" {
  cluster_name            = "Andrew Nester TF Test"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 20
  autoscale {
    min_workers = 1
    max_workers = 50
  }
  no_wait = true

  library {
    pypi {
        package = "fbprophet==0.6"
    }
  }
}
```

---------

Co-authored-by: Pieter Noordhuis <[email protected]>
Co-authored-by: Miles Yucht <[email protected]>
bundle/config/mutator/apply_presets.go Show resolved Hide resolved
@@ -39,22 +39,22 @@ func overrideJobCompute(j *resources.Job, compute string) {

func (m *overrideCompute) Apply(ctx context.Context, b *bundle.Bundle) diag.Diagnostics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a thread on the above for tracking purposes

You can do this in target overrides but I'd better remove it from example because it is confusing, this is just to illustrate that reference string can be used for compute id

But that would work, right? Setting bundle.cluster_id? Otherwise these all-purpose clusters are quite hard to use. You can't use a regular override to override a job cluster (new_cluster) with an all-purpose cluster (existing_cluster_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible to specify compute_id or cluster_id in your target with variable reference to a new cluster. I've added an example

bundle/config/mutator/apply_presets.go Show resolved Hide resolved
@@ -32,6 +32,7 @@ func allResourceTypes(t *testing.T) []string {
// the dyn library gives us the correct list of all resources supported. Please
// also update this check when adding a new resource
require.Equal(t, []string{
"clusters",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work with run_as?

I can see how it works with shared-user clusters, but is (I think?) incompatible with single-user clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work with run_as, this list is just list of available resources. Then it's added in allowList below as per comment in the section below

	// Action Item: If you are adding a new resource to DABs, please check in with
	// the relevant owning team whether the resource should be on the allow list or (implicitly) on
	// the deny list. Any resources that could have run_as semantics in the future
	// should be on the deny list.

bundle/config/root.go Show resolved Hide resolved
pietern added a commit to databricks/bundle-examples that referenced this pull request Sep 11, 2024
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please TAL at the last comment and failing test (looks like no_wait property missing).

bundle/config/mutator/compute_id_compat.go Show resolved Hide resolved
@andrewnester andrewnester added this pull request to the merge queue Sep 23, 2024
Merged via the queue into main with commit 56ed9be Sep 23, 2024
5 checks passed
@andrewnester andrewnester deleted the feature/all-purpose-clusters branch September 23, 2024 10:50
andrewnester added a commit that referenced this pull request Oct 1, 2024
Bundles:
 * Added support for creating all-purpose clusters ([#1698](#1698)).
 * Reduce time until the prompt is shown for bundle run ([#1727](#1727)).
 * Use Unity Catalog for pipelines in the default-python template ([#1766](#1766)).
 * Add verbose flag to the "bundle deploy" command ([#1774](#1774)).
 * Fixed full variable override detection ([#1787](#1787)).
 * Add sub-extension to resource files in built-in templates ([#1777](#1777)).
 * Fix panic in `apply_presets.go` ([#1796](#1796)).

Internal:
 * Assert tokens are redacted in origin URL when username is not specified ([#1785](#1785)).
 * Refactor jobs path translation ([#1782](#1782)).
 * Add JobTaskClusterSpec validate mutator ([#1784](#1784)).
 * Pin Go toolchain to 1.22.7 ([#1790](#1790)).
 * Modify SetLocation test utility to take full locations as argument ([#1788](#1788)).
 * Simplified isFullVariableOverrideDef implementation ([#1791](#1791)).
 * Sort tasks by `task_key` before generating the Terraform configuration ([#1776](#1776)).
 * Trim trailing whitespace ([#1794](#1794)).
 * Move trampoline code into trampoline package ([#1793](#1793)).
 * Rename `RootPath` -> `BundleRootPath` ([#1792](#1792)).

API Changes:
 * Changed `databricks apps delete` command to return .
 * Changed `databricks apps deploy` command with new required argument order.
 * Changed `databricks apps start` command to return .
 * Changed `databricks apps stop` command to return .
 * Added `databricks temporary-table-credentials` command group.
 * Added `databricks serving-endpoints put-ai-gateway` command.
 * Added `databricks disable-legacy-access` command group.
 * Added `databricks account disable-legacy-features` command group.

OpenAPI commit 6f6b1371e640f2dfeba72d365ac566368656f6b6 (2024-09-19)
Dependency updates:
 * Upgrade to Go SDK 0.47.0 ([#1799](#1799)).
 * Upgrade to TF provider 1.52 ([#1781](#1781)).
 * Bump golang.org/x/mod from 0.20.0 to 0.21.0 ([#1758](#1758)).
 * Bump github.com/hashicorp/hc-install from 0.7.0 to 0.9.0 ([#1772](#1772)).
github-merge-queue bot pushed a commit that referenced this pull request Oct 1, 2024
Bundles:
* Added support for creating all-purpose clusters
([#1698](#1698)).
* Reduce time until the prompt is shown for bundle run
([#1727](#1727)).
* Use Unity Catalog for pipelines in the default-python template
([#1766](#1766)).
* Add verbose flag to the "bundle deploy" command
([#1774](#1774)).
* Fixed full variable override detection
([#1787](#1787)).
* Add sub-extension to resource files in built-in templates
([#1777](#1777)).
* Fix panic in `apply_presets.go`
([#1796](#1796)).

Internal:
* Assert tokens are redacted in origin URL when username is not
specified ([#1785](#1785)).
* Refactor jobs path translation
([#1782](#1782)).
* Add JobTaskClusterSpec validate mutator
([#1784](#1784)).
* Pin Go toolchain to 1.22.7
([#1790](#1790)).
* Modify SetLocation test utility to take full locations as argument
([#1788](#1788)).
* Simplified isFullVariableOverrideDef implementation
([#1791](#1791)).
* Sort tasks by `task_key` before generating the Terraform configuration
([#1776](#1776)).
* Trim trailing whitespace
([#1794](#1794)).
* Move trampoline code into trampoline package
([#1793](#1793)).
* Rename `RootPath` -> `BundleRootPath`
([#1792](#1792)).

API Changes:
 * Changed `databricks apps delete` command to return .
* Changed `databricks apps deploy` command with new required argument
order.
 * Changed `databricks apps start` command to return .
 * Changed `databricks apps stop` command to return .
 * Added `databricks temporary-table-credentials` command group.
 * Added `databricks serving-endpoints put-ai-gateway` command.
 * Added `databricks disable-legacy-access` command group.
 * Added `databricks account disable-legacy-features` command group.

OpenAPI commit 6f6b1371e640f2dfeba72d365ac566368656f6b6 (2024-09-19)
Dependency updates:
* Upgrade to Go SDK 0.47.0
([#1799](#1799)).
* Upgrade to TF provider 1.52
([#1781](#1781)).
* Bump golang.org/x/mod from 0.20.0 to 0.21.0
([#1758](#1758)).
* Bump github.com/hashicorp/hc-install from 0.7.0 to 0.9.0
([#1772](#1772)).
andrewnester added a commit to databricks/bundle-examples that referenced this pull request Oct 8, 2024
Note: this requires databricks/cli#1698.

---------

Co-authored-by: Andrew Nester <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants