This repository contains a reference implementation of a web app that provides self-service capabilities for users to provision infrastructure in Google Cloud. The contract specified the use of a GitOps approach initiated from Backstage templates, with changes reconciled using Config Sync. The infrastructure is defined using a combination of Crossplane, Config Connector, and Terraform.
The reference implementation is split between two repositories:
- PHACDataHub/sci-portal-users contains the
User
,Group
, and infrastructure definitions. - PHACDataHub/sci-portal contains the rest of the reference implementation.
Users sign in using PHAC's designed Google authentication methods.
Important
Users must be added to the Backstage Catalog before they can log in. This is a known limitation documented in User Management.
Users can visualize the available templates on the Create... page. The templated deployments are prompt the user for their team, administrative details, and cloud resource configuration.
To provision a resource, select a template, fill out the required information, and submit the form. The user is provided with a link to view the Pull Request that is created.
In a GitOps approach the repository serves as the source of truth. After bootstrapping the cluster to start Config Sync, we use GitOps to define the desired state of the cluster. Config Sync reconciles the current state in the cluster and the desired state in the repositories.
When a user creates a templated deployment from Backstage it creates a Pull Request in the PHACDataHub/sci-portal-users repository.
When the Pull Request is merged, each templated deployment instance will appear in the Backstage Catalog and can provide helpful links. For example, the RAD Lab Data Science templates how a link to the Managed Notebooks in the Vertex AI Workbench.
Monitoring the current deployment status was not a prioritized feature. The next steps for development are documented in the Extensibility Report.
The templated deployments provision resources in a new isolated GCP project in alignment with the agency’s micro-segmentation security architecture.
Users can see the project Cost and % Budget in the Backstage Catalog. These values are updated daily.
Each project is actively monitored for consumption. Budget alert emails are sent when the budget reaches 25%, 50%, 75%, 90%, 95%, and 100%. Over-budget alert emails are sent for each percent between 100% and 120%.
A Looker Studio dashboard has been embedded on the Cost Dashboard page. This offers a flexible starting point that the team can refine to build a meaningful FinOps reports that meets their needs. Each project is labeled with the cost centre and display name to support reporting grouped by cost centre.
The billing data is exported daily to BigQuery for additional analysis.
The overall solution is extensible. It supports adding Software Templates, displaying custom information and links in the Catalog, adding custom actions to the Catalog, extending the permissions model, and much more. This is documented in the Extensibility Report.
This repository contains the following directories:
Directory | Description |
---|---|
.devbox | This directory contains the Devbox configuration to install gcloud in an isolated shell for development. |
backstage | This directory contains Backstage, including custom plugins and template definitions. |
bootstrap | This directory contains the scripts and infrastructure definitions to deploy Google Kubernetes Engine (GKE), Crossplane, and the Crossplane providers. |
budget-alerts | This directory contains a Cloud Function that sends budget alert emails with GC Notify. |
root-sync | This directory contains Kubernetes manifests and Kustomizations reconciled by Config Sync. |
taskfiles | This directory contains Task definitions used by Task. |
templates | This directory contains Terraform modules managed and modified by the Data Science Portal team. |
tests | This directory contains chainsaw tests that verify the Crossplane Compositions create the expected Kubernetes resources. |
At this time there is only one non-production environment. The reference implementation is not intended for production. Some concerns for moving to production have been documented in the Extensibility Report.
We encourage creating at least one more non-production environment to develop and maintain the system with confidence.
The cluster must be deployed with Config Sync for GitOps, Crossplane for the control plane, and additional infrastructure before we can build and run Backstage. This is only required the first time the cluster starts. The process is documented in bootstrap/README.md.
Install task
following the documentation. To install globally using Yarn run:
yarn global add @go-task/cli
To verify the installation and list available tasks run:
task --list
This project uses more than one version of Node.js (v18 and v20). We recommend using a tool that manages multiple versions of Node.js like nvm (Node Version Manager) or NVM for Windows.
To use the version of Node.js defined in the .nvmrc
file run:
nvm use
Backstage uses Yarn v1. It can be installed globally using corepack:
corepack enable
corepack prepare [email protected] --activate
or installed globally:
npm install --global [email protected]
Verify the installation by checking the Yarn version:
yarn -v
Deploy GCP infrastructure in Canadian regions. Use northamerica-northeast1
(Montreal) or northamerica-northeast2
(Toronto).
Follow these principles to define infrastructure using Config Connector or Terraform as managed resources in your Crossplane compositions:
- Define infrastructure exclusively using Config Connector resources, when possible.
Theconfig-connector
CLI tool may be helpful to export resources. - When infrastructure cannot be defined using exclusively Config Connector resources, use Terraform.
- Do not mix tools (Config Connector, Terraform, or others).
- Isolate the Terraform state for each GCP Project (or equivalent). Create a
ProviderConfig
that configures the state filebackend
for each Project. Configure theWorkspace
to use theProviderConfig
using theproviderConfigRef
. - Use Crossplane Usages to ensure resources are deleted in the expected order.
- If the Crossplane Composition requires conditional statements, use Composition Functions. It is a known limitation that Patch and Transforms do not support conditions.
If Terraform modules need to be modified, copy them to the templates/ directory. The RAD Lab Data Science and Gen AI modules have been copied and modified there.
We recommend the following approach to modifying CompositeResourceDefinitions
and Compositions
that are used in production:
- When making a breaking change to a
CompositeResourceDefinition
, follow the Crossplane documentation to create a newversion
. - When making a breaking change to a
Composition
, follow the Crossplane documentation to use Composition Revisions with a manual update policy.
This approach enables the team to test changes to manifests with confidence, the progressively roll out and upgrade the remaining resources.
Define infrastructure with confidence using tests. There are corresponding chainsaw
tests in the tests/templates/ directory that can be used to apply manifests and assert the expected cluster resources are provisioned.
Follow the documentation to use the backstage-cli
to update Backstage.
The Backstage installation was created from a template, then modified. To keep up to date with changes to the template follow the documentation and use the Backstage Upgrade Helper.
Review the Release Notes and documentation to upgarde Config Sync.
Review the release notes on docs.crossplane.io or GitHub, and follow the documentation to upgrade Crossplane.
Review the release notes and README to manually upgrade provider-kubernetes
in root-sync/base/crossplane/project/kubernetes.yaml.
Review the release notes and documentation to manually upgrade provider-terraform
in bootstrap/crossplane/templates/terrafrom/provider.yaml.
Warning
The RAD Lab Terraform modules require the gcloud
CLI tool to be availble where terraform
is run. We configure provider-terraform
to use a custom runtime image defined in bootstrap/crossplane/templates/terrafrom/build.
Update PROVIDER_TERRAFORM_VERSION
in the Dockerfile.
Warning
The RAD Lab Terraform modules require the gcloud
CLI tool to be availble where terraform
is run. We configure provider-terraform
to use a custom runtime image defined in bootstrap/crossplane/templates/terrafrom/build.
Review the release notes and update CLOUD_SDK_VERSION
in the Dockerfile.
Review the release notes and documentation to upgrade Config Connector.
If the Config Sync sync status appears stuck on Google Console check the root-reconciler
Pod logs in the config-management-system
namespace.
These references will help troubleshoot Crossplane:
To view the logs when the Terraform Provider runs plan
and apply
:
-
Find the
metadata.UID
for theWorkspace
managed resource -
Open a shell on the
provider-terraform
Pod -
View the log stored by UID. For example:
less /tf/b0709d67-7c59-4c6e-99b7-60dfb37e8f68/log_terraform.log
The resources
in a Crossplane Composition
must be managed resources. To create a Kubernetes resource, they must be wrapped in the Object
managed resource. Crossplane will not create Kubernetes resources directly.