Skip to content

wandb/terraform-google-dagster

Repository files navigation

terraform-google-dagster

Terraform module to provision required GCP infrastructure necessary for a Dagster Kubernetes-based deployment.

Note: this configuration does not bundle the Dagster application but rather all of its required services. A reference implementation may be found in the example-app directory.

Overview

The terraform-google-dagster module does not attempt to make any assumptions about how your Dagster deployment should look as this can vary widely and will not actually create a Dagster deployment. It will create all of the core foundational components necessary for running a Dagster cluster which should be easily pluggable into the Dagster Helm chart or your own Dagster Kubernetes resources.

The module will provision:

  • Service account: This will manage all of the resources associated with your application
  • Private network: Private network from which your resources connect to one another (specifically Kubernetes and Postgres)
  • CloudSQL Postgres Instance: A Cloud SQL Postgres instance
  • Kubernetes Cluster: Primary cluster from which you can run dagit, the dagster-daemon and user code deployments
  • Cloud Storage Bucket: Can be used as an IOManager, for log storage, asset materializations, or as a staging layer for data
  • Artifact Registry Docker Repository: This can be used for for private code deployment images

Example

You can find an example deployment utilizing the official Dagster Helm chart inside of the example-app/ directory.

Requirements

Name Version
terraform >= 1.5.0, < 2.0.0

Providers

No providers.

Modules

Name Source Version
cluster ./modules/cluster n/a
database ./modules/database n/a
networking ./modules/networking n/a
project_factory_project_services terraform-google-modules/project-factory/google//modules/project_services ~> 11.3
registry ./modules/registry n/a
service_account ./modules/service_account n/a
storage ./modules/storage n/a

Resources

No resources.

Inputs

Name Description Type Default Required
cloud_storage_bucket_location Location to create cloud storage bucket in. string "US" no
cloudsql_availability_type The availability type of the Cloud SQL instance. string "ZONAL" no
cloudsql_postgres_version The postgres version of the CloudSQL instance. string "POSTGRES_14" no
cloudsql_tier The machine type to use string "db-f1-micro" no
cluster_compute_machine_type Compute machine type to deploy cluster nodes on. string "e2-standard-2" no
cluster_monitoring_components Components to enable in the GKE monitoring stack. list(string)
[
"SYSTEM_COMPONENTS"
]
no
cluster_node_pool_max_node_count Max number of nodes cluster can scale up to. number 2 no
deletion_protection Indicates whether or not storage and databases have deletion protection enabled bool true no
domain The domain in which your Google Groups are defined. string n/a yes
namespace Namespace used as a prefix for all resources string n/a yes
project_id Project ID string n/a yes
region Google region string n/a yes

Outputs

Name Description
cloudsql_database Object containing connection parameters for provisioned CloudSQL database
cluster_ca_certificate Cluster certificate of provisioned Kubernetes cluster
cluster_endpoint Endpoint of provisioned Kubernetes cluster
cluster_id Id of provisioned Kubernetes cluster
network_name Name of provisioned VPC network
registry_image_path Docker image path of provisioned Artifact Registry
registry_image_pull_secret Name of Kubernetes secret containing Docker config with permissions to pull from private Artifact Registry repository
registry_location Location of provisioned Artifact Registry
registry_name Name of provisioned Artifact Registry
service_account Service account created to manage and authenticate services.
storage_bucket_name Name of provisioned Cloud Storage bucket

Development

If you'd like to contribute to this repository you'll have a few dependencies you'll need to install before committing. We use pre-commit to ensure standards are adhered to by running Terraform validations via git hooks. We specifically use the following packages:

  • conventional-pre-commit: No additional dependencies needed for this
  • terraform_validate: No additional dependencies needed for this
  • terraform_fmt: No additional depenencies needed for this
  • terraform_docs: Installation instructions here
  • terraform_tflint: Installation instructions here

You'll also need to install pre-commit.

Once you have these dependencies installed you can execute the following:

pre-commit install
pre-commit install --hook-type commit-msg  # installs the hook for commit messages to enforce conventional commits
pre-commit run -a  # this will run pre-commit across all files in the project to validate installation

Now after creating git commits these commit hooks will execute and ensure your changes adhere to the project standards. In general, we've followed the guidelines for best-practices laid out in Terraform Best Practices, it would be recommended to follow these guidelines when submitting any contributions of your own.

Requirements

Name Version
terraform >= 1.5.0, < 2.0.0

Providers

No providers.

Modules

Name Source Version
cluster ./modules/cluster n/a
database ./modules/database n/a
networking ./modules/networking n/a
project_factory_project_services terraform-google-modules/project-factory/google//modules/project_services ~> 11.3
registry ./modules/registry n/a
service_account ./modules/service_account n/a
storage ./modules/storage n/a

Resources

No resources.

Inputs

Name Description Type Default Required
cloud_storage_bucket_location Location to create cloud storage bucket in. string "US" no
cloudsql_availability_type The availability type of the Cloud SQL instance. string "ZONAL" no
cloudsql_postgres_version The postgres version of the CloudSQL instance. string "POSTGRES_14" no
cloudsql_tier The machine type to use string "db-f1-micro" no
cluster_compute_machine_type Compute machine type to deploy cluster nodes on. string "e2-standard-2" no
cluster_monitoring_components Components to enable in the GKE monitoring stack. list(string)
[
"SYSTEM_COMPONENTS"
]
no
cluster_node_pool_max_node_count Max number of nodes cluster can scale up to. number 2 no
deletion_protection Indicates whether or not storage and databases have deletion protection enabled bool true no
domain The domain in which your Google Groups are defined. string n/a yes
namespace Namespace used as a prefix for all resources string n/a yes
project_id Project ID string n/a yes
region Google region string n/a yes

Outputs

Name Description
cloudsql_database Object containing connection parameters for provisioned CloudSQL database
cluster_ca_certificate Cluster certificate of provisioned Kubernetes cluster
cluster_endpoint Endpoint of provisioned Kubernetes cluster
cluster_id Id of provisioned Kubernetes cluster
network_name Name of provisioned VPC network
registry_image_path Docker image path of provisioned Artifact Registry
registry_image_pull_secret Name of Kubernetes secret containing Docker config with permissions to pull from private Artifact Registry repository
registry_location Location of provisioned Artifact Registry
registry_name Name of provisioned Artifact Registry
service_account Service account created to manage and authenticate services.
storage_bucket_name Name of provisioned Cloud Storage bucket