Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate how to use a development cluster #37

Merged
merged 4 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions knowledge_base/development_cluster/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Development cluster

This example demonstrates how to define and use a development (all-purpose) cluster in a Databricks Asset Bundle.

This bundle defines an `example_job` which is run on a job cluster in production mode.

For the development mode (default `dev` target) the job is overriden to use a development cluster which is provisioned
as part of the bundle deployment as well.

For more information, please refer to the [documentation](https://docs.databricks.com/en/dev-tools/bundles/settings.html#clusters).

## Prerequisites

* Databricks CLI v0.229.0 or above

## Usage

Update the `host` field under `workspace` in `databricks.yml` to the Databricks workspace you wish to deploy to.

Run `databricks bundle deploy` to deploy the job. It's deployed to `dev` target with a defined `development_cluster` cluster.

Run `databricks bundle deploy -t prod` to deploy the job to prod target. It's deployed with a job cluster instead of development one.

Run `databricks bundle run example_job` to run the job.
41 changes: 41 additions & 0 deletions knowledge_base/development_cluster/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
bundle:
name: development_cluster

include:
- resources/*.yml

workspace:
host: https://e2-dogfood.staging.cloud.databricks.com

targets:
dev:
mode: development
default: true

# By configuring this field for the "dev" target, all jobs in this bundle
# are overridden to use the all-purpose cluster defined below.
#
# This can increase the speed of development when iterating on code and job definitions,
# as you don't have to wait for job clusters to start for every job run.
#
# Note: make sure that the cluster configuration below matches the job cluster
# definition that will be used when deploying the other targets.
cluster_id: ${resources.clusters.development_cluster.id}

resources:
clusters:
development_cluster:
cluster_name: Development cluster
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
num_workers: 0
autotermination_minutes: 30
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"ResourceClass": "SingleNode"

prod: {
# No overrides
}
22 changes: 22 additions & 0 deletions knowledge_base/development_cluster/resources/example_job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
resources:
jobs:
example_job:
name: "Example job to demonstrate using an interactive cluster for development"

tasks:
- task_key: notebook
job_cluster_key: cluster
notebook_task:
notebook_path: ../src/hello.py

job_clusters:
- job_cluster_key: cluster
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
num_workers: 0
spark_conf:
"spark.databricks.cluster.profile": "singleNode"
"spark.master": "local[*, 4]"
custom_tags:
"ResourceClass": "SingleNode"
3 changes: 3 additions & 0 deletions knowledge_base/development_cluster/src/hello.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Databricks notebook source

print("Hello, World!")