Data Analytics as a Service for the Government of Canada and external collaborators.
If your question does not appear in this document, please reach out to us on our Slack Support Channel.
- Anyone with a Statistics Canada (
@statcan.gc.ca
) email address can access the AAW.
The AAW includes tools that allow data science users to open almost any file. The AAW supports many commonly used file formats, including (but not limited to):
- csv
- xlsx
- json
- xml
- sas7bdat
- sqlite
- many others... just ask :-)
Use Case | Compute Resources | Time (Hours/Week) | Cost | ||||
---|---|---|---|---|---|---|---|
CPU | RAM (GB) | GPU | Weekly | Monthly | Annually | ||
CPU: Occasional Use | 2 | 8 | 0 | 8 | 1.1367 | 4.88781 | 59.1084 |
CPU: During Business Hours | 2 | 8 | 0 | 40 | 5.6835 | 24.43905 | 295.542 |
CPU: 24/7 | 2 | 8 | 0 | 168 | 23.8707 | 102.64401 | 1241.2764 |
Use Case | Compute Resources | Time (Hours/Week) | Cost | ||||
---|---|---|---|---|---|---|---|
CPU | RAM (GB) | GPU | Weekly | Monthly | Annually | ||
GPU: Occaisonal Use | 0 | 0 | 1 | 8 | 34.468 | 148.2124 | 1792.336 |
GPU: During Business Hours | 0 | 0 | 1 | 40 | 172.34 | 741.062 | 8961.68 |
GPU: 24/7 | 0 | 0 | 1 | 168 | 723.828 | 3112.4604 | 37639.056 |
- One must consult with FDI (F.A.I.R. Data Infrastructure) before Protected B Data can be loaded into MinIO. The FDI team owns an Azure Data Factory pipeline for moving data, typically from on premise, to an Azure Storage Account and MinIO is our S3 gateway to that storage account.
- At the moment, no. We are currently looking into solutions for sharing data between the AAW and CAE (which supports Power BI).
Does using SAS entail different costs than the others? Are there a limited number of licenses or instances that can be run?
- SAS support is currently experimental and will rely on existing Statistics Canada SAS software licenses.
- Press the suspend server button (square to the left of the garbage can icon to delete a server). This will suspend the workspace to save on costs.
- Please keep in mind all data that is not stored on persistent disks (persistent workspace volume or persistent data volumes) is deleted when a workspace is suspended. In particular data on non-persistent disks and data on the filesystem outside the workspace or data volume (for instance the
/tmp
folder) will be permanently lost. - When suspended the workspace and data volumes remain locked and cannot be deleted or attached to another server.
- To resume a suspended server, press the resume button (triangle icon to the left of the garbage can). When resumed the server will have the same workspace and data volumes as before (with all data kept as-is if the volumes were persistent) and has the exact same specification (CPU, RAM, GPU, and other settings). Things stored outside the home directory and persistent data volumes (like conda virtual environments) will be gone and will need to be recreated if necessary.
- Do we need to stop any running instances in our notebooks when we’re not using it? If so, how do we do this? I checked the documentation and there is no guidelines on how to stop any running notebooks as a cost-saving measure.
Are there any pre-loaded data (datasets) in AAW that we can access and use for both R and Python notebooks?
- Our JupyterLab images come with some example notebooks and data, they can be found in
/aaw-contrib-jupyter-notebooks/
. - Our R Studio image also has some example notebooks and data, they can be found in
/aaw-contrib-r-notebooks/
.
We highly encourage you to watch our YouTube presentation given at Stratosphere:
A discussion about some of the security best practices in use by this platform:
The following is a list of all the general
related repositories for the Advanced Analytics Workspace project.
Repository | Description | Visibility |
---|---|---|
aaw-argocd-applications | ArgoCD Applications | Private |
aaw-argocd-manifests | Manifests used for ArgoCD deployments | Public |
aaw-argoflow-azure | Kubeflow deployment powered by ArgoCD | Public |
aaw-kubeflow-containers | Containers to be used within Kubeflow | Public |
aaw-contrib-containers | Containers to be used for general purpose Data Science | Public |
aaw-contrib-jupyter-notebooks | Jupyter Notebooks to be used with the Advanced Analytics Workspace platform | Public |
aaw-contrib-r-notebooks | R Notebooks to be used with Advanced Analytics Workspace platform | Public |
aaw-gatekeeper-constraints | Gatekeeper constraints built specifically for AAW | Private |
aaw-goofys-injector | Mount an S3 bucket, Data Lake, Blob Storage as a file system in a Notebook | Public |
aaw-inferenceservices-controller | Kubernetes controller for managing inference services | Public |
aaw-kubeflow-manifests | Kustomize installation manifests for Kubeflow | Public |
aaw-kubeflow-controller | Kubeflow controller which sets PodDefaults + Vault policies for each Profile detected | Public |
aaw-kubeflow-mlops | Kubeflow MLOps pipeline using GitHub Actions | Public |
aaw-kubeflow-opa-sync | Synchronize profile editors into the Open Policy Agent for use in MinIO Access Control | Public |
aaw-kubeflow-pipelines-secret-scanner | Scan all Kubeflow pipelines for exposed secrets | Public |
aaw-kubeflow-profiles | Kubeflow profile manifests stored in YAML | Private |
aaw-kubeflow-profiles-controller | Kubeflow profiles controller which allows for custom configuration for an individual profile | Public |
aaw-minio-credential-injector | Mutating webhook which adds minio credential annotations to notebook pods | Public |
aaw-network-policies | Kubernetes network policies for AAW | Private |
aaw-prob-notebook-controller | Kubernetes controller for managing Authorization Policies associated to Protected-B Notebooks | Public |
aaw-security-proposal | Proposal for the implementation of Protected B workloads in AAW | Public |
aaw-toleration-injector | Kubernetes toleration injector with support for GPUs and Node Pools | Public |
The following is a list of all the terraform
related repositories for the
Advanced Analytics Workspace project.
## Installs AAW Platform and Infrastructure
##
## └─── https://github.com/statcan/terraform-advanced-analytics-workspaces-infrastructure
## ├─── https://github.com/statcan/aaw-dev-cc-00
## ├─── https://github.com/statcan/aaw-prod-cc-00
## │ ├── https://github.com/statcan/terraform-azure-statcan-aaw-environment
## │ │ ├── https://github.com/statcan/terraform-statcan-aaw-network
## │ │ └── https://github.com/statcan/terraform-azure-statcan-cloud-native-environment-infrastructure
## │ │ ├── https://github.com/canada-ca-terraform-modules/terraform-azurerm-kubernetes-cluster
## │ │ └── https://github.com/canada-ca-terraform-modules/terraform-azurerm-kubernetes-cluster-nodepool
## │ └─── https://github.com/statcan/terraform-statcan-aaw-platform (see below)
## └─── https://github.com/statcan/terraform-azure-statcan-aaw-region-environment
Component | Repository | Description |
---|---|---|
AAW | terraform-advanced-analytics-workspaces-infrastructure | Reference implementation for an Advanced Analytics Workspaces (AAW) infrastructure pipeline |
AAW | aaw-dev-cc-00 | Reference implementation for an Advanced Analytics Workspaces (AAW) development environment |
AAW | aaw-prod-cc-00 | Reference implementation for an Advanced Analytics Workspaces (AAW) production environment |
AAW | terraform-azure-statcan-aaw-environment | Terraform module of Advanced Analytics Workspaces (AAW) per-environment Azure configuration |
AAW | terraform-azure-statcan-aaw-network | Terraform module of Advanced Analytics Workspaces (AAW) networking |
AAW | terraform-azure-statcan-cloud-native-environment-infrastructure | Terraform module for Statistics Canada's Cloud Native Environment Azure Cloud Infrastructure |
AAW | terraform-azurerm-kubernetes-cluster | Terraform module for Azure Kubernetes Service (AKS) cluster |
AAW | terraform-azurerm-kubernetes-cluster-nodepool | Terraform module for Azure Kubernetes Service (AKS) nodepool |
AAW | terraform-azure-statcan-aaw-region-environment | Terraform module of Advanced Analytics Workspaces (AAW) per-region configuration of Azure |
AAW | terraform-statcan-aaw-platform | Terraform module for the Advanced Analytics Workspaces (AAW) platform |
## Statistics Canada's Cloud Native Platform (CNP)
##
## └─── https://github.com/statcan/terraform-statcan-aaw-platform
## ├─── https://github.com/statcan/terraform-azure-statcan-cloud-native-platform-infrastructure
## │ ├─── aad_pod_identity
## │ ├─── cert_manager
## │ ├─── vault
## │ └─── velero
## ├─── https://github.com/statcan/terraform-statcan-kubernetes-core-platform
## │ ├─── aad_pod_identity
## │ ├─── cert_manager
## │ ├─── fluentd
## │ ├─── gatekeeper
## │ ├─── kubecost
## │ ├─── prometheus
## │ ├─── vault_agent
## │ └─── velero
## ├─── https://github.com/statcan/terraform-statcan-kubernetes-app-platform
## │ ├─── istio operator
## │ └─── istio gateway handling
## └─── https://github.com/statcan/terraform-kubernetes-namespace
## └─── daaas-system
Component | Repository | Description |
---|---|---|
CNS | terraform-azure-statcan-cloud-native-platform-infrastructure | Terraform module for Statistics Canada Azure Cloud Native Platform Infrastructure |
CNS | terraform-statcan-kubernetes-core-platform | Terraform module for Statistics Canada Core Kubernetes Platform |
CNS | terraform-statcan-kubernetes-app-platform | Terraform module for Statistics Canada Kubernetes Application Platform |
Repository | Description | Visibility |
---|---|---|
terraform-aaw-managed-databases | Terraform module for deployment of Azure Managed Databases | Private |
terraform-aaw-vault | Terraform module for configuring Hashicorp Vault | Private |
The following is a list of some of the collaborative
work we made available to
improve upstream projects.
Repository | Description | Visibility |
---|---|---|
boathouse | Manage Kubernetes storage mounts with Goofys | Public |
jupyter-apis | Golang replacement for the Kubeflow Jupyter Web APIs | Public |
jupyterlab-language-pack-fr_FR | JupyterLab fr-FR Language Pack | Public |
vault-plugin-secrets-minio | Vault plugin which will provision multi-user keys for Minio | Public |
The following is a list of some of the forked
projects where we have provided
multilingual support and other UX related enhancements.
Repository | Description | Visibility |
---|---|---|
kubeflow | Multilingual support for Kubeflow | Public |
kubeflow-pipelines | Multilingual support for Kubeflow Pipelines | Public |
minio | Multilingual support for MinIO | Public |
minio-console | Multilingual support for MinIO Console | Public |
rstudio | Multilingual support for RStudio | Public |