đź“„ Extensive documentation is available via our GitHub Pages Docs site.
📢 We maintain the project as a truly open-source project. We maintain the project on a best effort basis. We welcome contributions from the community. Feel free to help us answering issues, reviewing PRs, or maintaining and improving the project.
📢
v5
replaces Amazon Linux 2 with Amazon Linux 2023 as default OS. Check the PR for more details and other changes.
📢 For contributions to older versions you can make a PR to the related branch, e.g.
v4
. We have no release process in place for older versions.
This Terraform module creates the required infrastructure needed to host GitHub Actions self-hosted, auto-scaling runners on AWS spot instances. It provides the required logic to handle the life cycle for scaling up and down using a set of AWS Lambda functions. Runners are scaled down to zero to avoid costs when no workflows are active.
- Scaling: Scale up and down based on GitHub events
- Sustainability: Scale down to zero when no jobs are running
- Security: Runners are created on-demand and terminated after use (ephemeral runners)
- Cost optimization: Runners are created on spot instances
- Tailored software, hardware and network configuration: Bring your own AMI, define the instance types and subnets to use.
- OS support: Linux (x64/arm64) and Windows
- Multi-Runner: Create multiple runner configurations with a single deployment
- GitHub cloud and GitHub Enterprise Server (GHES) support.
- Org and repo level runners. enterprise level runners are not supported (yet).
Check out the detailed instructions in the Getting Started section of the docs. On a high level, the following steps are required to get started:
- Setup your AWS account
- Create and configure a GitHub App
- Download or build the required lambdas
- Deploy the module using Terraform
- Install the GitHub App to your organization or repositories and add your repositories to the runner group(s).
Check out the provided Terraform examples in the examples directory for different scenarios.
Please check the configuration section of the docs for major configuration options. See the Terraform module documentation for all available options.
This project is licensed under the MIT License - see the LICENSE file for details.
We welcome contributions, please check out the contribution guide. Be aware we use pre commit hooks to update the docs.
This module is part of the Philips Forest.
___ _
/ __\__ _ __ ___ ___| |_
/ _\/ _ \| '__/ _ \/ __| __|
/ / | (_) | | | __/\__ \ |_
\/ \___/|_| \___||___/\__|
Infrastructure
Talk to the forestkeepers in the runners-channel
on Slack.
Terraform root module documention
Name | Version |
---|---|
terraform | >= 1.3.0 |
aws | ~> 5.27 |
random | ~> 3.0 |
Name | Version |
---|---|
aws | 5.31.0 |
random | 3.6.0 |
Name | Source | Version |
---|---|---|
ami_housekeeper | ./modules/ami-housekeeper | n/a |
instance_termination_watcher | ./modules/termination-watcher | n/a |
runner_binaries | ./modules/runner-binaries-syncer | n/a |
runners | ./modules/runners | n/a |
ssm | ./modules/ssm | n/a |
webhook | ./modules/webhook | n/a |
Name | Type |
---|---|
aws_sqs_queue.queued_builds | resource |
aws_sqs_queue.queued_builds_dlq | resource |
aws_sqs_queue_policy.build_queue_dlq_policy | resource |
aws_sqs_queue_policy.build_queue_policy | resource |
random_string.random | resource |
aws_iam_policy_document.deny_unsecure_transport | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
ami_filter | Map of lists used to create the AMI filter for the action runner AMI. | map(list(string)) |
{ |
no |
ami_housekeeper_cleanup_config | Configuration for AMI cleanup.amiFilters - Filters to use when searching for AMIs to cleanup. Default filter for images owned by the account and that are available.dryRun - If true, no AMIs will be deregistered. Default false.launchTemplateNames - Launch template names to use when searching for AMIs to cleanup. Default no launch templates.maxItems - The maximum numer of AMI's tha will be queried for cleanup. Default no maximum.minimumDaysOld - Minimum number of days old an AMI must be to be considered for cleanup. Default 30.ssmParameterNames - SSM parameter names to use when searching for AMIs to cleanup. This parameter should be set when using SSM to configure the AMI to use. Default no SSM parameters. |
object({ |
{} |
no |
ami_housekeeper_lambda_s3_key | S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
ami_housekeeper_lambda_s3_object_version | S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
ami_housekeeper_lambda_schedule_expression | Scheduler expression for action runner binary syncer. | string |
"rate(1 day)" |
no |
ami_housekeeper_lambda_timeout | Time out of the lambda in seconds. | number |
300 |
no |
ami_housekeeper_lambda_zip | File location of the lambda zip file. | string |
null |
no |
ami_id_ssm_parameter_name | Externally managed SSM parameter (of data type aws:ec2:image) that contains the AMI ID to launch runner instances from. Overrides ami_filter | string |
null |
no |
ami_kms_key_arn | Optional CMK Key ARN to be used to launch an instance from a shared encrypted AMI | string |
null |
no |
ami_owners | The list of owners used to select the AMI of action runner instances. | list(string) |
[ |
no |
associate_public_ipv4_address | Associate public IPv4 with the runner. Only tested with IPv4 | bool |
false |
no |
aws_partition | (optiona) partition in the arn namespace to use if not 'aws' | string |
"aws" |
no |
aws_region | AWS region. | string |
n/a | yes |
block_device_mappings | The EC2 instance block device configuration. Takes the following keys: device_name , delete_on_termination , volume_type , volume_size , encrypted , iops , throughput , kms_key_id , snapshot_id . |
list(object({ |
[ |
no |
cloudwatch_config | (optional) Replaces the module's default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | string |
null |
no |
create_service_linked_role_spot | (optional) create the service linked role for spot instances that is required by the scale-up lambda. | bool |
false |
no |
delay_webhook_event | The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event. | number |
30 |
no |
disable_runner_autoupdate | Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the GitHub article | bool |
false |
no |
enable_ami_housekeeper | Option to disable the lambda to clean up old AMIs. | bool |
false |
no |
enable_cloudwatch_agent | Enables the cloudwatch agent on the ec2 runner instances. The runner uses a default config that can be overridden via cloudwatch_config . |
bool |
true |
no |
enable_ephemeral_runners | Enable ephemeral runners, runners will only be used once. | bool |
false |
no |
enable_event_rule_binaries_syncer | DEPRECATED: Replaced by state_event_rule_binaries_syncer . |
bool |
null |
no |
enable_fifo_build_queue | Enable a FIFO queue to keep the order of events received by the webhook. Recommended for repo level runners. | bool |
false |
no |
enable_jit_config | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set enable_jit_config to false to avoid a breaking change when having your own AMI. |
bool |
null |
no |
enable_job_queued_check | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | bool |
null |
no |
enable_managed_runner_security_group | Enables creation of the default managed security group. Unmanaged security groups can be specified via runner_additional_security_group_ids . |
bool |
true |
no |
enable_metrics_control_plane | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | bool |
null |
no |
enable_organization_runners | Register runners to organization, instead of repo level | bool |
false |
no |
enable_runner_binaries_syncer | Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI. | bool |
true |
no |
enable_runner_detailed_monitoring | Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details. | bool |
false |
no |
enable_runner_on_demand_failover_for_errors | Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to InsufficientInstanceCapacity . When not defined the default behavior is to retry later. |
list(string) |
[] |
no |
enable_runner_workflow_job_labels_check_all | If set to true all labels in the workflow job must match the GitHub labels (os, architecture and self-hosted ). When false if any label matches it will trigger the webhook. |
bool |
true |
no |
enable_ssm_on_runners | Enable to allow access to the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances. | bool |
false |
no |
enable_user_data_debug_logging_runner | Option to enable debug logging for user-data, this logs all secrets as well. | bool |
false |
no |
enable_userdata | Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI. | bool |
true |
no |
eventbridge | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling.enable : Enable the EventBridge feature.accept_events : List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. |
object({ |
{} |
no |
ghes_ssl_verify | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | bool |
true |
no |
ghes_url | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | string |
null |
no |
github_app | GitHub app parameters, see your github app. Ensure the key is the base64-encoded .pem file (the output of base64 app.private-key.pem , not the content of private-key.pem ). |
object({ |
n/a | yes |
idle_config | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | list(object({ |
[] |
no |
instance_allocation_strategy | The allocation strategy for spot instances. AWS recommends using price-capacity-optimized however the AWS default is lowest-price . |
string |
"lowest-price" |
no |
instance_max_spot_price | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | string |
null |
no |
instance_profile_path | The path that will be added to the instance_profile, if not set the environment name will be used. | string |
null |
no |
instance_target_capacity_type | Default lifecycle used for runner instances, can be either spot or on-demand . |
string |
"spot" |
no |
instance_termination_watcher | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.enable : Enable or disable the spot termination watcher.'features': Enable or disable features of the termination watcher. memory_size : Memory size linit in MB of the lambda.s3_key : S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.s3_object_version : S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.timeout : Time out of the lambda in seconds.zip : File location of the lambda zip file. |
object({ |
{} |
no |
instance_types | List of instance types for the action runner. Defaults are based on runner_os (al2023 for linux and Windows Server Core for win). | list(string) |
[ |
no |
job_queue_retention_in_seconds | The number of seconds the job is held in the queue before it is purged. | number |
86400 |
no |
job_retry | Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app.enable : Enable or disable the job retry feature.delay_in_seconds : The delay in seconds before the job retry check lambda will check the job status.delay_backoff : The backoff factor for the delay.lambda_memory_size : Memory size limit in MB for the job retry check lambda.lambda_timeout : Time out of the job retry check lambda in seconds.max_attempts : The maximum number of attempts to retry the job. |
object({ |
{} |
no |
key_name | Key pair name | string |
null |
no |
kms_key_arn | Optional CMK Key ARN to be used for Parameter Store. This key must be in the current account. | string |
null |
no |
lambda_architecture | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions. | string |
"arm64" |
no |
lambda_principals | (Optional) add extra principals to the role created for execution of the lambda, e.g. for local testing. | list(object({ |
[] |
no |
lambda_runtime | AWS Lambda runtime. | string |
"nodejs20.x" |
no |
lambda_s3_bucket | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | string |
null |
no |
lambda_security_group_ids | List of security group IDs associated with the Lambda function. | list(string) |
[] |
no |
lambda_subnet_ids | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id . |
list(string) |
[] |
no |
lambda_tags | Map of tags that will be added to all the lambda function resources. Note these are additional tags to the default tags. | map(string) |
{} |
no |
lambda_tracing_mode | DEPRECATED: Replaced by tracing_config . |
string |
null |
no |
log_level | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | string |
"info" |
no |
logging_kms_key_id | Specifies the kms key id to encrypt the logs with. | string |
null |
no |
logging_retention_in_days | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | number |
180 |
no |
matcher_config_parameter_store_tier | The tier of the parameter store for the matcher configuration. Valid values are Standard , and Advanced . |
string |
"Standard" |
no |
metrics | Configuration for metrics created by the module, by default disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. | object({ |
{} |
no |
metrics_namespace | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | string |
null |
no |
minimum_running_time_in_minutes | The time an ec2 action runner should be running at minimum before terminated, if not busy. | number |
null |
no |
pool_config | The configuration for updating the pool. The pool_size to adjust to by the events triggered by the schedule_expression . For example you can configure a cron expression for weekdays to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use schedule_expression_timezone to override the schedule time zone (defaults to UTC). |
list(object({ |
[] |
no |
pool_lambda_memory_size | Memory size limit for scale-up lambda. | number |
512 |
no |
pool_lambda_reserved_concurrent_executions | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | number |
1 |
no |
pool_lambda_timeout | Time out for the pool lambda in seconds. | number |
60 |
no |
pool_runner_owner | The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported. | string |
null |
no |
prefix | The prefix used for naming resources | string |
"github-actions" |
no |
queue_encryption | Configure how data on queues managed by the modules in ecrypted at REST. Options are encryped via SSE, non encrypted and via KMSS. By default encryptes via SSE is enabled. See for more details the Terraform aws_sqs_queue resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue. |
object({ |
{ |
no |
redrive_build_queue | Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting enabled to false. 2. Enable by setting enabled to true , maxReceiveCount to a number of max retries. |
object({ |
{ |
no |
repository_white_list | List of github repository full names (owner/repo_name) that will be allowed to use the github app. Leave empty for no filtering. | list(string) |
[] |
no |
role_path | The path that will be added to role path for created roles, if not set the environment name will be used. | string |
null |
no |
role_permissions_boundary | Permissions boundary that will be added to the created roles. | string |
null |
no |
runner_additional_security_group_ids | (optional) List of additional security groups IDs to apply to the runner. | list(string) |
[] |
no |
runner_architecture | The platform architecture of the runner instance_type. | string |
"x64" |
no |
runner_as_root | Run the action runner under the root user. Variable runner_run_as will be ignored. |
bool |
false |
no |
runner_binaries_s3_logging_bucket | Bucket for action runner distribution bucket access logging. | string |
null |
no |
runner_binaries_s3_logging_bucket_prefix | Bucket prefix for action runner distribution bucket access logging. | string |
null |
no |
runner_binaries_s3_sse_configuration | Map containing server-side encryption configuration for runner-binaries S3 bucket. | any |
{ |
no |
runner_binaries_s3_versioning | Status of S3 versioning for runner-binaries S3 bucket. Once set to Enabled the change cannot be reverted via Terraform! | string |
"Disabled" |
no |
runner_binaries_syncer_lambda_memory_size | Memory size limit in MB for binary syncer lambda. | number |
256 |
no |
runner_binaries_syncer_lambda_timeout | Time out of the binaries sync lambda in seconds. | number |
300 |
no |
runner_binaries_syncer_lambda_zip | File location of the binaries sync lambda zip file. | string |
null |
no |
runner_boot_time_in_minutes | The minimum time for an EC2 runner to boot and register as a runner. | number |
5 |
no |
runner_credit_specification | The credit option for CPU usage of a T instance. Can be unset, "standard" or "unlimited". | string |
null |
no |
runner_ec2_tags | Map of tags that will be added to the launch template instance tag specifications. | map(string) |
{} |
no |
runner_egress_rules | List of egress rules for the GitHub runner instances. | list(object({ |
[ |
no |
runner_extra_labels | Extra (custom) labels for the runners (GitHub). Labels checks on the webhook can be enforced by setting enable_runner_workflow_job_labels_check_all . GitHub read-only labels should not be provided. |
list(string) |
[] |
no |
runner_group_name | Name of the runner group. | string |
"Default" |
no |
runner_iam_role_managed_policy_arns | Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role | list(string) |
[] |
no |
runner_log_files | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | list(object({ |
null |
no |
runner_metadata_options | Metadata options for the ec2 runner instances. By default, the module uses metadata tags for bootstrapping the runner, only disable instance_metadata_tags when using custom scripts for starting the runner. |
map(any) |
{ |
no |
runner_name_prefix | The prefix used for the GitHub runner name. The prefix will be used in the default start script to prefix the instance name when register the runner in GitHub. The value is availabe via an EC2 tag 'ghr:runner_name_prefix'. | string |
"" |
no |
runner_os | The EC2 Operating System type to use for action runner instances (linux,windows). | string |
"linux" |
no |
runner_run_as | Run the GitHub actions agent as user. | string |
"ec2-user" |
no |
runners_ebs_optimized | Enable EBS optimization for the runner instances. | bool |
false |
no |
runners_lambda_s3_key | S3 key for runners lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
runners_lambda_s3_object_version | S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
runners_lambda_zip | File location of the lambda zip file for scaling runners. | string |
null |
no |
runners_maximum_count | The maximum number of runners that will be created. | number |
3 |
no |
runners_scale_down_lambda_memory_size | Memory size limit in MB for scale-down lambda. | number |
512 |
no |
runners_scale_down_lambda_timeout | Time out for the scale down lambda in seconds. | number |
60 |
no |
runners_scale_up_Lambda_memory_size | Memory size limit in MB for scale-up lambda. | number |
null |
no |
runners_scale_up_lambda_memory_size | Memory size limit in MB for scale-up lambda. | number |
512 |
no |
runners_scale_up_lambda_timeout | Time out for the scale up lambda in seconds. | number |
30 |
no |
runners_ssm_housekeeper | Configuration for the SSM housekeeper lambda. This lambda deletes token / JIT config from SSM.schedule_expression : is used to configure the schedule for the lambda.enabled : enable or disable the lambda trigger via the EventBridge.lambda_memory_size : lambda memery size limit.lambda_timeout : timeout for the lambda in seconds.config : configuration for the lambda function. Token path will be read by default from the module. |
object({ |
{ |
no |
scale_down_schedule_expression | Scheduler expression to check every x for scale down. | string |
"cron(*/5 * * * ? *)" |
no |
scale_up_reserved_concurrent_executions | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | number |
1 |
no |
ssm_paths | The root path used in SSM to store configuration and secrets. | object({ |
{} |
no |
state_event_rule_binaries_syncer | Option to disable EventBridge Lambda trigger for the binary syncer, useful to stop automatic updates of binary distribution | string |
"ENABLED" |
no |
subnet_ids | List of subnets in which the action runner instances will be launched. The subnets need to exist in the configured VPC (vpc_id ), and must reside in different availability zones (see philips-labs#2904) |
list(string) |
n/a | yes |
syncer_lambda_s3_key | S3 key for syncer lambda function. Required if using an S3 bucket to specify lambdas. | string |
null |
no |
syncer_lambda_s3_object_version | S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
tags | Map of tags that will be added to created resources. By default resources will be tagged with name and environment. | map(string) |
{} |
no |
tracing_config | Configuration for lambda tracing. | object({ |
{} |
no |
userdata_content | Alternative user-data content, replacing the templated one. By providing your own user_data you have to take care of installing all required software, including the action runner and registering the runner. Be-aware configuration paramaters in SSM as well as tags are treated as internals. Changes will not trigger a breaking release. | string |
null |
no |
userdata_post_install | Script to be ran after the GitHub Actions runner is installed on the EC2 instances | string |
"" |
no |
userdata_pre_install | Script to be ran before the GitHub Actions runner is installed on the EC2 instances | string |
"" |
no |
userdata_template | Alternative user-data template file path, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored. | string |
null |
no |
vpc_id | The VPC for security groups of the action runners. | string |
n/a | yes |
webhook_lambda_apigateway_access_log_settings | Access log settings for webhook API gateway. | object({ |
null |
no |
webhook_lambda_memory_size | Memory size limit in MB for webhook lambda in. | number |
256 |
no |
webhook_lambda_s3_key | S3 key for webhook lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
webhook_lambda_s3_object_version | S3 object version for webhook lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
webhook_lambda_timeout | Time out of the webhook lambda in seconds. | number |
10 |
no |
webhook_lambda_zip | File location of the webhook lambda zip file. | string |
null |
no |
Name | Description |
---|---|
binaries_syncer | n/a |
instance_termination_handler | n/a |
instance_termination_watcher | n/a |
queues | SQS queues. |
runners | n/a |
ssm_parameters | n/a |
webhook | n/a |