Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 versioning to managed schemas #291

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [7.8.0] - 2024-12-12
### Added
- Hive databases backed by S3 can now have versioning enabled.

## [7.7.0] - 2024-11-19
### Changed
- Updated the hms namespaces for metrics for both readwrite and readonly.
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,13 @@ module "apiary" {
admin_roles = "role1_arn,role2_arn" //kms key management will be restricted to these roles.
client_roles = "role3_arn,role4_arn" //s3 bucket read/write and kms key usage will be restricted to these roles.
customer_accounts = "account_id1,account_id2" //this will override module level apiary_customer_accounts
}
},
{
schema_name = "db_s3_versioning_enabled",
s3_versioning_enabled = "Enabled", // Enabled/Disabled/Suspended. Once enabled it can only be suspended
s3_versioning_expiration_days = 2, // If Enabled, default 7
s3_versioning_max_versions_allowed = 1
},
]
apiary_customer_accounts = ["aws_account_no_1", "aws_account_no_2"]
# single policy with multiple conditions will use AND operator
Expand Down
5 changes: 3 additions & 2 deletions VARIABLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,8 @@
| hms\_ecs\_metrics\_readonly\_namespace | ECS readwrite metrics namespace | `string` | `hmsreadonlylegacy` | no |
| hms\_ecs\_metrics\_readwrite\_namespace | ECS readonly metrics namespace | `string` | `hmsreadwritelegacy` | no |
| hms\_k8s\_metrics\_readonly\_namespace | K8s readwrite metrics namespace | `string` | `hms_readonly` | no |
| hms\_k8s\_metrics\_readwrite\_namespace | K8s readonly metrics namespace | `string` | `hms_readwrite` | no |
| s3\_versioning\_expiration\_days | Number of days (TTL) before objects are expired. Bucket need to have versioning enabled. | `number` | `7` | no |
| s3\_versioning\_max\_versions\_retained | Number of noncurrent versions Amazon S3 will retain. Must be a non-zero positive integer. Bucket need to have versioning enabled. | `number` | `3` | no |

### apiary_assume_roles

Expand Down Expand Up @@ -367,4 +368,4 @@ apiary_managed_schemas = [
producer_roles = "arn:aws:iam::000000000:role/role-1,arn:aws:iam::000000000:role/role-2"
}
]
```
```
36 changes: 36 additions & 0 deletions s3.tf
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,42 @@ resource "aws_s3_bucket" "apiary_data_bucket" {
}
}

resource "aws_s3_bucket_versioning" "apiary_data_bucket_versioning" {
for_each = {
for schema in local.schemas_info : "${schema["schema_name"]}" => schema
}
bucket = each.value["data_bucket"]
versioning_configuration {
status = lookup(each.value, "s3_versioning_enabled", "Disabled")
}
}

resource "aws_s3_bucket_lifecycle_configuration" "apiary_data_bucket_versioning_lifecycle" {
for_each = {
for schema in local.schemas_info : "${schema["schema_name"]}" => schema
}
bucket = each.value["data_bucket"]
# Rule enabled when expiration max days is set
rule {
id = "expire-noncurrent-versions-days"
status = lookup(each.value, "s3_versioning_expiration_days", "") != "" && lookup(each.value, "s3_versioning_max_versions_retained", "") == "" ? "Enabled" : "Disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't take the default value into account at all does it, the default is ""? So it will only be enabled or disabled depending on what is in the map, and the default value in the var is ignored here. But then the default value IS pulled in in the actual days or versions setting below. I'm not sure I understand... if we override it in the map, then the default will be ignored in both places. If we don't override it in the map, then it will be disabled. There is no case in which the default value will be used?

I guess in the second rule for max versions and days it could be used but only for setting the expiration_days, which only can happen if expiration_days is not set in the map, then it will use the default for it?

I wonder if it is simpler to just leave out the max versions rule completely. I'm not sure we have a case for it anyway. It's an option in the API but I don't think we want to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's do it in that case only days


noncurrent_version_expiration {
noncurrent_days = tonumber(lookup(each.value, "s3_versioning_expiration_days", var.s3_versioning_expiration_days))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, should we only configure the version expiration? Or should we also configure the max version retain , like 1,2,3 ? If only expiration configured, all older version will be deleted after the days meet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied privately, conclusion was to start only with days.

}
}
# Rule enabled when expiration max days and versions are set
rule {
id = "expire-noncurrent-versions-number-and-days"
status = lookup(each.value, "s3_versioning_max_versions_retained", "") != "" ? "Enabled" : "Disabled"

noncurrent_version_expiration {
newer_noncurrent_versions = tonumber(lookup(each.value, "s3_versioning_max_versions_retained", var.s3_versioning_max_versions_retained))
noncurrent_days = tonumber(lookup(each.value, "s3_versioning_expiration_days", var.s3_versioning_expiration_days))
}
}
}

resource "aws_s3_bucket_inventory" "apiary_bucket" {
for_each = var.s3_enable_inventory == true ? {
for schema in local.schemas_info : "${schema["schema_name"]}" => schema
Expand Down
12 changes: 12 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,18 @@ variable "ecs_requires_compatibilities" {
default = ["EC2", "FARGATE"]
}

variable "s3_versioning_expiration_days" {
description = "Number of days (TTL) before objects are expired. Bucket need to have versioning enabled."
type = number
default = 7
}

variable "s3_versioning_max_versions_retained" {
description = "Number of noncurrent versions Amazon S3 will retain. Must be a non-zero positive integer. Bucket need to have versioning enabled."
type = number
default = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be defaulted off... we don't want to force keeping a number of versions, we want all the versions to be deleted after 7 days by default. If both are set, then it will keep the max versions regardless of the days, which means we will end up keeping some of the deleted data forever (at least this is what I believe I tested and saw that it always retained up to the max versions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added both options, only days, or days and max_versions.

My idea was that if maybe there is a bucket that creates many versions a day, we could reduce costs setting max to 1. Not sure if it could happen, I can delete that part anyway and add it when needed.

}

variable "hms_ro_tolerations" {
description = <<EOF
Adds a list of tolerations for the HMS readonly pods. For example if you
Expand Down
Loading