Terraform module that provisions AWS resources to run Metaflow in production.
This module consists of submodules that can be used separately as well:
- AWS Batch cluster to run Metaflow steps (
metaflow-computation
) - blob storage and metadata database (
metaflow-datastore
) - a service providing API to record and query past executions (
metaflow-metadata-service
) - resources to deploy Metaflow flows on Step Functions processing (
metaflow-step-functions
) - Metaflow UI(
metaflow-ui
)
You can either use this high-level module, or submodules individually. See each module's corresponding README.md
for more details.
You can find a complete example that uses this module but also includes setting up VPC and other non-Metaflow-specific parts of infra in this repo.
To format documentation:
pipx install pre-commit
pre-commit install --install-hooks
pre-commit run --all-files
Name | Source | Version |
---|---|---|
metaflow-common | ./modules/common | n/a |
metaflow-computation | ./modules/computation | n/a |
metaflow-datastore | ./modules/datastore | n/a |
metaflow-metadata-service | ./modules/metadata-service | n/a |
metaflow-step-functions | ./modules/step-functions | n/a |
metaflow-ui | ./modules/ui | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
access_list_cidr_blocks | List of CIDRs we want to grant access to our Metaflow Metadata Service. Usually this is our VPN's CIDR blocks. | list(string) |
[] |
no |
api_basic_auth | Enable basic auth for API Gateway? (requires key export) | bool |
true |
no |
batch_type | AWS Batch Compute Type ('ec2', 'fargate', 'spot') | string |
"ec2" |
no |
compute_environment_ami_id | The AMI ID to use for Batch Compute Environment EC2 instances. If not specified, defaults to the latest ECS optimised AMI. | string |
null |
no |
compute_environment_desired_vcpus | Desired Starting VCPUs for Batch Compute Environment [0-16] for EC2 Batch Compute Environment (ignored for Fargate) | number |
8 |
no |
compute_environment_egress_cidr_blocks | CIDR blocks to which egress is allowed from the Batch Compute environment's security group | list(string) |
[ |
no |
compute_environment_instance_types | The instance types for the compute environment | list(string) |
[ |
no |
compute_environment_max_vcpus | Maximum VCPUs for Batch Compute Environment [16-96] | number |
64 |
no |
compute_environment_min_vcpus | Minimum VCPUs for Batch Compute Environment [0-16] for EC2 Batch Compute Environment (ignored for Fargate) | number |
8 |
no |
compute_environment_spot_bid_percentage | The maximum percentage of on-demand EC2 instance price to bid for spot instances when using the 'spot' AWS Batch Compute Type. | number |
100 |
no |
compute_environment_user_data_base64 | Base64 hash of the user data to use for Batch Compute Environment EC2 instances. | string |
null |
no |
db_instance_type | RDS instance type to launch for PostgresQL database. | string |
"db.t2.small" |
no |
ecs_cluster_id | The ID of an existing ECS cluster to run services on. If no cluster ID is specfied, a new cluster will be created. | string |
null |
no |
enable_custom_batch_container_registry | Provisions infrastructure for custom Amazon ECR container registry if enabled | bool |
false |
no |
enable_step_functions | Provisions infrastructure for step functions if enabled | bool |
n/a | yes |
extra_ui_backend_env_vars | Additional environment variables for UI backend container | map(string) |
{} |
no |
extra_ui_static_env_vars | Additional environment variables for UI static app | map(string) |
{} |
no |
iam_partition | IAM Partition (Select aws-us-gov for AWS GovCloud, otherwise leave as is) | string |
"aws" |
no |
metadata_service_container_image | Container image for metadata service | string |
"" |
no |
postgres_engine_version | Postgres engine version to use for Metaflow database. | string |
"11" |
no |
resource_prefix | string prefix for all resources | string |
"metaflow" |
no |
resource_suffix | string suffix for all resources | string |
"" |
no |
subnet1_id | First subnet used for availability zone redundancy | string |
n/a | yes |
subnet2_id | Second subnet used for availability zone redundancy | string |
n/a | yes |
tags | aws tags | map(string) |
n/a | yes |
ui_alb_internal | Defines whether the ALB for the UI is internal | bool |
false |
no |
ui_allow_list | List of CIDRs we want to grant access to our Metaflow UI Service. Usually this is our VPN's CIDR blocks. | list(string) |
[] |
no |
ui_certificate_arn | SSL certificate for UI. If no certificate arn is provided, HTTP will be used. | string |
null |
no |
ui_static_container_image | Container image for the UI frontend app | string |
"" |
no |
vpc_cidr_blocks | The VPC CIDR blocks that we'll access list on our Metadata Service API to allow all internal communications | list(string) |
n/a | yes |
vpc_id | The id of the single VPC we stood up for all Metaflow resources to exist in. | string |
n/a | yes |
Name | Description |
---|---|
METAFLOW_BATCH_JOB_QUEUE | AWS Batch Job Queue ARN for Metaflow |
METAFLOW_DATASTORE_SYSROOT_S3 | Amazon S3 URL for Metaflow DataStore |
METAFLOW_DATATOOLS_S3ROOT | Amazon S3 URL for Metaflow DataTools |
METAFLOW_ECS_S3_ACCESS_IAM_ROLE | Role for AWS Batch to Access Amazon S3 |
METAFLOW_EVENTS_SFN_ACCESS_IAM_ROLE | IAM role for Amazon EventBridge to access AWS Step Functions. |
METAFLOW_SERVICE_INTERNAL_URL | URL for Metadata Service (Accessible in VPC) |
METAFLOW_SERVICE_URL | URL for Metadata Service (Accessible in VPC) |
METAFLOW_SFN_DYNAMO_DB_TABLE | AWS DynamoDB table name for tracking AWS Step Functions execution metadata. |
METAFLOW_SFN_IAM_ROLE | IAM role for AWS Step Functions to access AWS resources (AWS Batch, AWS DynamoDB). |
api_gateway_rest_api_id_key_id | API Gateway Key ID for Metadata Service. Fetch Key from AWS Console [METAFLOW_SERVICE_AUTH_KEY] |
batch_compute_environment_security_group_id | The ID of the security group attached to the Batch Compute environment. |
datastore_s3_bucket_kms_key_arn | The ARN of the KMS key used to encrypt the Metaflow datastore S3 bucket |
metadata_svc_ecs_task_role_arn | n/a |
metaflow_api_gateway_rest_api_id | The ID of the API Gateway REST API we'll use to accept MetaData service requests to forward to the Fargate API instance |
metaflow_batch_container_image | The ECR repo containing the metaflow batch image |
metaflow_profile_json | Metaflow profile JSON object that can be used to communicate with this Metaflow Stack. Store this in ~/.metaflow/config_[stack-name] and select with $ export METAFLOW_PROFILE=[stack-name] . |
metaflow_s3_bucket_arn | The ARN of the bucket we'll be using as blob storage |
metaflow_s3_bucket_name | The name of the bucket we'll be using as blob storage |
migration_function_arn | ARN of DB Migration Function |
ui_alb_arn | UI ALB ARN |
ui_alb_dns_name | UI ALB DNS name |