description | coverY |
---|---|
A Kubernetes operator for Databricks |
0 |
A kube-rs operator to enable GitOps style management of Databricks resources. It supports the following APIs:
API | CRD |
---|---|
Jobs 2.1 | DatabricksJob |
Git Credentials 2.0 | GitCredential |
Repos 2.0 | Repo |
Secrets 2.0 | DatabricksSecretScope, DatabricksSecret |
Experimental headed towards stable. See the GitHub project board for the roadmap. Contributions and feedback are welcome!
Looking for a more in-depth example? Read the tutorial.
Add the Helm repository and install the chart:
helm repo add mach https://mach-kernel.github.io/databricks-kube-operator
helm install databricks-kube-operator mach/databricks-kube-operator
Create a config map in the same namespace as the operator. To override the configmap name, --set configMapName=my-custom-name
:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: databricks-kube-operator
data:
api_secret_name: databricks-api-secret
EOF
Create a secret with your API URL and credentials:
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
access_token: $(echo -n 'shhhh' | base64)
databricks_url: $(echo -n 'https://my-tenant.cloud.databricks.com/api' | base64)
kind: Secret
metadata:
name: databricks-api-secret
type: Opaque
EOF
See the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.
apiVersion: com.dstancu.databricks/v1
kind: DatabricksJob
metadata:
name: my-word-count
namespace: default
spec:
job:
settings:
email_notifications:
no_alert_for_skipped_runs: false
format: MULTI_TASK
job_clusters:
- job_cluster_key: word-count-cluster
new_cluster:
...
max_concurrent_runs: 1
name: my-word-count
git_source:
git_branch: misc-and-docs
git_provider: gitHub
git_url: https://github.com/mach-kernel/databricks-kube-operator
tasks:
- email_notifications: {}
job_cluster_key: word-count-cluster
notebook_task:
notebook_path: examples/job.py
source: GIT
task_key: my-word-count
timeout_seconds: 0
timeout_seconds: 0
Changes made by users in the Databricks webapp will be overwritten by the operator if drift is detected:
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!
Diff (remote, kube):
json atoms at path ".settings.tasks[0].notebook_task.notebook_path" are not equal:
lhs:
"examples/job_oops_is_this_right.py"
rhs:
"examples/job.py"
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...
Look at jobs (allowed to be viewed by the operator's access token):
$ kubectl get databricksjobs
NAME STATUS
contoso-ingest-qa RUNNING
contoso-ingest-staging INTERNAL_ERROR
contoso-stats-qa TERMINATED
contoso-stats-staging NO_RUNS
$ kubectl describe databricksjob contoso-ingest-qa
...
A job's status key surfaces API information about the latest run. The status is polled every 60s:
$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status
{
"latest_run_state": {
"life_cycle_state": "INTERNAL_ERROR",
"result_state": "FAILED",
"state_message": "Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.",
"user_cancelled_or_timedout": false
}
}
Begin by creating the configmap as per the Helm instructions.
Generate and install the CRDs by running the crd_gen
bin target:
cargo run --bin crd_gen | kubectl apply -f -
The quickest way to test the operator is with a working minikube cluster:
minikube start
minikube tunnel &
export RUST_LOG=databricks_kube
cargo run
[2022-11-02T18:56:25Z INFO databricks_kube] boot! (build: df7e26b-modified)
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: databricksjobs.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: gitcredentials.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for settings in config map: databricks-kube-operator
[2022-11-02T18:56:25Z INFO databricks_kube::context] Found config map
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)
The client is generated by openapi-generator
and then lightly postprocessed so we get models that derive JsonSchema
and fix some bugs.
TODO: Manual client 'fixes'
# Hey!! This uses GNU sed
# brew install gnu-sed
# Jobs API
openapi-generator generate -g rust -i openapi/jobs-2.1-aws.yaml -c openapi/config-jobs.yaml -o dbr_jobs
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_jobs/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_jobs/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_jobs/Cargo.toml
# Missing import?
gsed -r -i -e 's/(use reqwest;)/\1\nuse crate::models::ViewsToExport;/' dbr_jobs/src/apis/default_api.rs
# Git Credentials API
openapi-generator generate -g rust -i openapi/gitcredentials-2.0-aws.yaml -c openapi/config-git.yaml -o dbr_git_creds
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_git_creds/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_git_creds/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_git_creds/Cargo.toml
# Repos API
openapi-generator generate -g rust -i openapi/repos-2.0-aws.yaml -c openapi/config-repos.yaml -o dbr_repo
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_repo/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_repo/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_repo/Cargo.toml
# Secrets API
openapi-generator generate -g rust -i openapi/secrets-aws.yaml -c openapi/config-secrets.yaml -o dbr_secrets
sed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_secrets/src/models/*
sed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_secrets/src/models/*
sed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_secrets/Cargo.toml
Deriving CustomResource
uses macros to generate another struct. For this example, the output struct name would be DatabricksJob
:
#[derive(Clone, CustomResource, Debug, Default, Deserialize, PartialEq, Serialize, JsonSchema)]
#[kube(
group = "com.dstancu.databricks",
version = "v1",
kind = "DatabricksJob",
derive = "Default",
namespaced
)]
pub struct DatabricksJobSpec {
pub job: Job,
}
rust-analyzer
shows squiggles when you use crds::databricks_job::DatabricksJob
, but one may want to look inside. To see what is generated with cargo-expand:
rustup default nightly
cargo expand --bin databricks_kube
Want to add support for a new API? Provided it has an OpenAPI definition, these are the steps. Look for existing examples in the codebase:
- Download API definition into
openapi/
and make a Rust generator configuration (feel free to copy the others and change name) - Generate the SDK, add it to the Cargo workspace and dependencies for
databricks-kube/
- Implement
RestConfig<TSDKConfig>
for your new client - Define the new CRD Spec type (follow kube-rs tutorial)
impl RemoteAPIResource<TAPIResource> for MyNewCRD
impl StatusAPIResource<TStatusType> for MyNewCRD
and specifyTStatusType
in your CRD- Add the new resource to the context ensure CRDs condition
- Add the new resource to
crdgen.rs
Tests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.
$ cargo test -- --test-threads=1