Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on ingress/egress dry_run/enforced resources #20519

Open
maleksah opened this issue Nov 29, 2024 · 7 comments
Open

Bug on ingress/egress dry_run/enforced resources #20519

maleksah opened this issue Nov 29, 2024 · 7 comments
Assignees
Labels

Comments

@maleksah
Copy link

maleksah commented Nov 29, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version & Provider Version(s)

Terraform v1.8.5
on darwin_arm64

  • provider registry.terraform.io/hashicorp/google v5.41.0
  • provider registry.terraform.io/hashicorp/google-beta v5.41.0

Affected Resource(s)

google_access_context_manager_ingress_policy
google_access_context_manager_egress_policy
google_access_context_manager_service_perimeter_dry_run_ingress_policy
google_access_context_manager_service_perimeter_dry_run_egress_policy

Terraform Configuration

This is the VPC SC perimeter config.
Here, I ignore all ingress/egress rules in dry_run and enforced modes.
Ingress/egress rules are manged in another terraform stage.

resource "google_access_context_manager_service_perimeter" "this" {
  count                     = var.create_perimeter ? 1 : 0
  description               = var.perimeter_description
  parent                    = "accessPolicies/${var.access_policy_id}"
  name                      = local.perimeter
  title                     = var.perimeter_name
  perimeter_type            = "PERIMETER_TYPE_REGULAR"
  use_explicit_dry_run_spec = true

  dynamic "spec" {
    for_each = !var.enforced ? [1] : []
    content {
      restricted_services = module.dictionary.default_list_restricted_services
      resources           = []
      access_levels       = []
      vpc_accessible_services {
        enable_restriction = true
        allowed_services = concat([
          "RESTRICTED-SERVICES",
          ],
          var.allowed_services,
        )
      }
    }
  }

  # When we are in dry-run mode we don't apply restriction for the enforced config
  # the goal is to have the config enforced ready if we decide to pass from dry-run to enfoced whithout any downtime
  dynamic "status" {
    for_each = !var.enforced ? [1] : []
    content {
      restricted_services = []
      resources           = []
      access_levels       = []
      # This is the default value.
      # If we put it explicitly, we have this change for each plan...
      #      vpc_accessible_services {
      #        enable_restriction = false
      #      }
    }
  }

  dynamic "status" {
    for_each = var.enforced ? [1] : []
    content {
      restricted_services = module.dictionary.default_list_restricted_services
      resources           = []
      access_levels       = []
      vpc_accessible_services {
        enable_restriction = true
        allowed_services = concat([
          "RESTRICTED-SERVICES",
          ],
          var.allowed_services,
        )
      }
    }
  }
  lifecycle {
    ignore_changes = [
      status[0].resources,
      status[0].ingress_policies,
      status[0].egress_policies,
      spec[0].resources,
      spec[0].ingress_policies,
      spec[0].egress_policies,
    ]
  }
}

Here is an example of the dry_run ingress policy

resource "google_access_context_manager_service_perimeter_dry_run_ingress_policy" "this" {
  for_each  = !var.enforced ? { for v in var.vpc_sc_ingress : join("-", v.ingress_from.identities) => v } : {}
  perimeter = local.perimeter
  ingress_from {
    identities = each.value.ingress_from.identities
    # sources {} # if left empty all sources
    dynamic "sources" {
      for_each = each.value.ingress_from.sources
      content {
        resource     = sources.value.resource
        access_level = sources.value.access_level
      }
    }
  }

  ingress_to {
    resources = each.value.ingress_to.resources
    dynamic "operations" {
      for_each = each.value.ingress_to.operations
      content {
        service_name = operations.value.service_name
        dynamic "method_selectors" {
          for_each = operations.value.service_name != "*" ? operations.value.methods : []
          content {
            method = method_selectors.value
          }
        }
      }
    }
  }

  lifecycle {
    create_before_destroy = true
  }
  depends_on = [
    google_access_context_manager_service_perimeter.this,
    google_access_context_manager_service_perimeter_resource.add_project_to_app_perimeter_vpc_sc,
    google_access_context_manager_service_perimeter_dry_run_resource.add_project_to_app_perimeter_vpc_sc,
  ]
}

Debug Output

No response

Expected Behavior

No response

Actual Behavior

When we update something on the VPC SC perimeter, for example, I add an API on the restricted services and apply the terraform.
After that, when I do a plan on the ingress or egress policies, terraform try to create them again, even if they already exists on the perimeter (they were created with the same terraform).
If we do a terraform plan on the ingress/egress ressources without changing the VPC SC perimeter, it's working as expected.

The problem occurs only when we update the google_access_context_manager_service_perimeter resource (ex: add or remove an API in the restricted services), Then when we do a plan on ingress/egress policies, they try to create the policies again (they already been created with the same terraform).

Steps to reproduce

  1. remove one API from restricted services and do a terraform apply on google_access_context_manager_service_perimeter
Screenshot 2024-11-29 at 17 33 57
  1. do terraform plan on the other stage containing ingress/egress policies
    I see that terraform want to create the ingress/egress policies even if they already exist on the perimeter (created before with the same terraform).
Screenshot 2024-11-29 at 17 23 15

Important Factoids

No response

References

No response

@maleksah maleksah added the bug label Nov 29, 2024
@github-actions github-actions bot added forward/review In review; remove label to forward service/accesscontextmanager labels Nov 29, 2024
@maleksah maleksah changed the title Bu tg Bug on ingress/egress dry_run/enforced resources Nov 29, 2024
@ggtisc ggtisc self-assigned this Nov 29, 2024
@ggtisc
Copy link
Collaborator

ggtisc commented Nov 29, 2024

Hi @maleksah!

I'm trying to replicate this issue, but since I don't have access to your locals and variables I can't get the same result as you. This is my current code based on yours:

resource "google_access_context_manager_access_policy" "acm_ap_20519" {
  parent = "organizations/1234567890"
  title  = "acm-ap-20519"
}

resource "google_access_context_manager_service_perimeter" "acm_service_perimeter_20519" {
  description               = "something"
  parent                    = "accessPolicies/${google_access_context_manager_access_policy.acm_ap_20519.name}"
  name                      = "accessPolicies/${google_access_context_manager_access_policy.acm_ap_20519.name}/servicePerimeters/storage_perimeter"
  title                     = "acm-service-perimeter-20519"
  perimeter_type            = "PERIMETER_TYPE_REGULAR"
  use_explicit_dry_run_spec = true

  spec {
    restricted_services = ["storage.googleapis.com"] # "bigtable.googleapis.com"
    resources           = []
    access_levels       = []
    
    vpc_accessible_services {
      enable_restriction  = true
      allowed_services    = ["bigquery.googleapis.com"]
    }
  }

  status {
    restricted_services = ["storage.googleapis.com"] # "bigtable.googleapis.com"
    resources           = []
    access_levels       = []
    
    vpc_accessible_services {
      enable_restriction = true
      allowed_services = ["bigquery.googleapis.com"]
    }
  }

  lifecycle {
    ignore_changes = [
      status[0].resources,
      status[0].ingress_policies,
      status[0].egress_policies,
      spec[0].resources,
      spec[0].ingress_policies,
      spec[0].egress_policies
    ]
  }
}

resource "google_access_context_manager_service_perimeter_dry_run_ingress_policy" "acm_sp_dry_run_ingress_policy" {
  perimeter = "${google_access_context_manager_service_perimeter.acm_service_perimeter_20519.name}"

  ingress_from {
    identities = ["user:[email protected]"]

    sources {
      resource     = "projects/1234567890"
      access_level = "*"
    }
  }

  ingress_to {
    resources = ["projects/1234567890"]

    operations {
      service_name = "bigquery.googleapis.com"

      method_selectors {
        method = "*"
      }
    }
  }

  lifecycle {
    create_before_destroy = true
  }

  depends_on = [
    google_access_context_manager_service_perimeter.acm_service_perimeter_20519
  ]
}

Could you share with us the lacking information and confirm us the correct terraform version you are using?

Note that you can use examples like replacing the org_id for 1234567890 or the identities for user:[email protected] because these are sensitive data values, so you can use similar examples, we just need to see if there is something different with the values you are setting for these resources.

Note: @NickElliot, @roaks3 this looks similar to Sarah French issue. I leave this note for your review in case it can be helpful.

@maleksah
Copy link
Author

maleksah commented Dec 2, 2024

Hello @ggtisc ,

Here is a simplified version of the code.

Terraform Version & Provider Version(s)

Terraform v1.8.5
on darwin_arm64

  • provider registry.terraform.io/hashicorp/google v5.41.0
  • provider registry.terraform.io/hashicorp/google-beta v5.41.0

Affected Resource(s)

google_access_context_manager_ingress_policy
google_access_context_manager_egress_policy
google_access_context_manager_service_perimeter_dry_run_ingress_policy
google_access_context_manager_service_perimeter_dry_run_egress_policy

Terraform Configuration

locals {
  access_policy_id    = "1234567890"
  perimeter           = "accessPolicies/${local.access_policy_id}/servicePerimeters/test_perimeter"
  restricted_services = [
    "storage.googleapis.com",
#    "artifactregistry.googleapis.com",
  ]
  enforced = false

  vpc_sc_ingress = [{
    ingress_from = {
      identities = [
        "user:[email protected]",
        "serviceAccount:[email protected]"
      ]
      sources = [

        {
          resource = null
          access_level = "*"
        }
      ]
    }
    ingress_to = {
      resources = ["*"]
      operations = [{
        service_name = "*"
      }]
    }

  }]

  vpc_sc_egress = [
    {
      egress_from = {
        identity_type = null
        identities    = [
          "user:[email protected]",
          "serviceAccount:[email protected]"
        ]
      }
      egress_to = [{
        resources = ["*"]
        operations = [{ service_name = "storage.googleapis.com", methods = ["google.storage.objects.get"] }]
      }]
    }
  ]
}

resource "google_access_context_manager_service_perimeter" "this" {
  description               = "VPC SC perimeter test"
  parent                    = "accessPolicies/${local.access_policy_id}"
  name                      = local.perimeter
  title                     = "Test Perimeter"
  perimeter_type            = "PERIMETER_TYPE_REGULAR"
  use_explicit_dry_run_spec = true

  dynamic "spec" {
    for_each = !local.enforced ? [1] : []
    content {
      restricted_services = local.restricted_services
      resources           = []
      access_levels       = []
      vpc_accessible_services {
        enable_restriction = true
        allowed_services   = ["RESTRICTED-SERVICES"]
      }
    }
  }

  dynamic "status" {
    for_each = local.enforced ? [1] : []
    content {
      restricted_services = local.restricted_services
      resources           = []
      access_levels       = []
      vpc_accessible_services {
        enable_restriction = true
        allowed_services   = ["RESTRICTED-SERVICES"]
      }
    }
  }
  lifecycle {
    ignore_changes = [
      status[0].resources,
      status[0].ingress_policies,
      status[0].egress_policies,
      spec[0].resources,
      spec[0].ingress_policies,
      spec[0].egress_policies,
    ]
  }
}


resource "google_access_context_manager_service_perimeter_dry_run_ingress_policy" "this" {
  for_each  = !local.enforced ? {for v in local.vpc_sc_ingress : join("-", v.ingress_from.identities) => v} : {}
  perimeter = google_access_context_manager_service_perimeter.this.name
  ingress_from {
    identities = each.value.ingress_from.identities
    # sources {} # if left empty all sources
    dynamic "sources" {
      for_each = each.value.ingress_from.sources
      content {
        resource     = sources.value.resource
        access_level = sources.value.access_level
      }
    }
  }

  ingress_to {
    resources = each.value.ingress_to.resources
    dynamic "operations" {
      for_each = each.value.ingress_to.operations
      content {
        service_name = operations.value.service_name
        dynamic "method_selectors" {
          for_each = operations.value.service_name != "*" ? operations.value.methods : []
          content {
            method = method_selectors.value
          }
        }
      }
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_access_context_manager_service_perimeter_dry_run_egress_policy" "this" {
  for_each = !local.enforced ? {
    for v in local.vpc_sc_egress :
    (length(v.egress_from.identities) > 0 ? join("-", v.egress_from.identities) : v.egress_from.identity_type) => v
  } : {}
  perimeter = google_access_context_manager_service_perimeter.this.name
  egress_from {
    identities    = each.value.egress_from.identities
    identity_type = each.value.egress_from.identity_type
  }

  dynamic "egress_to" {
    for_each = each.value.egress_to
    content {
      resources = egress_to.value.resources
      dynamic "operations" {
        for_each = egress_to.value.operations
        content {
          service_name = operations.value.service_name
          dynamic "method_selectors" {
            for_each = operations.value.service_name != "*" ? operations.value.methods : []
            content {
              method = method_selectors.value
            }
          }
        }
      }
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Steps to reproduce

1- do "terraform apply" for this code
Screenshot 2024-12-02 at 14 44 25

2- do "terraform plan", all good no changes for google_access_context_manager_service_perimeter_dry_run_ingress_policy and google_access_context_manager_service_perimeter_dry_run_egress_policy
Screenshot 2024-12-02 at 14 49 50

3- add a service in the restricted services of the VPC SC perimeter with terraform.
For example, add "artifactregistry.googleapis.com" in the list of local.restricted_services and do "terraform apply"
Screenshot 2024-12-02 at 14 52 26

4- Then if I do terraform plan without any other change, terraform try to create google_access_context_manager_service_perimeter_dry_run_ingress_policy and google_access_context_manager_service_perimeter_dry_run_egress_policy, even if they already exists for the perimeter (created at step 1)
Screenshot 2024-12-02 at 14 56 08

I observed that this is not working well when we have multiple identities in ingress/egress dry_run/enforced rules.

This is the same behaviour for

  • google_access_context_manager_ingress_policy
  • google_access_context_manager_egress_policy
  • google_access_context_manager_service_perimeter_dry_run_ingress_policy
  • google_access_context_manager_service_perimeter_dry_run_egress_policy

Every time, there is a change on the parent VPC SC perimeter (on restricted services for example), if we do a plan after this change, terraform try to recreate ingress/egress dry_run/enforced rules even if they already exist (we see them on the UI).

This is really impacting our use of VPC SC in our company.

Can we have a fix ASAP please?

Thanks

@steffencircle
Copy link

steffencircle commented Dec 4, 2024

Hi,

i was also able to reproduce this issue with the latest 6.12.0 provider version.

Steps the reproduce:

1.) Deploy Project, 20 Service Accounts and VPC-SC Perimeter with an Ingress Rule for Service Accounts from the project

variable "billing_account" {}
variable "org_id" {}
variable "access_context_manager_policy_id" {}

locals {
  restricted_services = ["storage.googleapis.com"]
}

resource "google_folder" "folder" {
  display_name        = "vpc-sc-test"
  parent              = "organizations/${var.org_id}"
  deletion_protection = false
}

resource "random_string" "project_suffix" {
  length  = 5
  special = false
  upper   = false
}

resource "google_project" "project" {
  name            = "VPC SC Test Project"
  project_id      = "vpc-sc-test-project-${random_string.project_suffix.result}"
  folder_id       = google_folder.folder.name
  billing_account = var.billing_account
  deletion_policy = "DELETE"
}

resource "random_string" "sa_suffix" {
  count   = 20
  length  = 6
  special = false
  upper   = false
}

resource "google_service_account" "sa" {
  count      = 20
  project    = google_project.project.project_id
  account_id = "service-account-${random_string.sa_suffix[count.index].result}"
}

resource "google_access_context_manager_service_perimeter" "service-perimeter" {
  parent = "accessPolicies/${var.access_context_manager_policy_id}"
  name   = "accessPolicies/${var.access_context_manager_policy_id}/servicePerimeters/vpc_sc_test_001"
  title  = "vpc_sc_test_001"
  status {
    restricted_services = local.restricted_services
    resources           = ["projects/${google_project.project.number}"]
  }

  lifecycle {
    ignore_changes = [status[0].ingress_policies, status[0].egress_policies]
  }
}

resource "google_access_context_manager_service_perimeter_ingress_policy" "sa_ingress" {

  perimeter = google_access_context_manager_service_perimeter.service-perimeter.name

  ingress_from {
    identities = [for i in range(20) : google_service_account.sa[i].member]
    sources {
      access_level = "*"
    }
  }
  ingress_to {
    resources = ["*"]
    operations {
      service_name = "*"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

This first apply will work nicely and follow up runs will show no diff.

2.) Change the local.restricted_services parameter and add an additional service

...
locals {
  restricted_services = ["storage.googleapis.com","bigquery.googleapis.com"]
}
...
  1. Run terraform apply with the local changed. This will adjust the perimeter object.

image

4.) Without changing any code. Run terraform plan again.
This will "think" that the Ingress Rule is missing and plans to deploy it again
A terraform apply will add the rule again resulting in 2 Ingress Rules in the Periemeter.

image

image

@maleksah
Copy link
Author

maleksah commented Dec 4, 2024

I observed another random bug on the same resources:

  • google_access_context_manager_ingress_policy
  • google_access_context_manager_egress_policy
  • google_access_context_manager_service_perimeter_dry_run_ingress_policy
  • google_access_context_manager_service_perimeter_dry_run_egress_policy

I have multiple pipelines (more than 100), that are executed in parallel, adding VPC SC ingress/egress rules on the same VPC SC perimeter. Sometimes, the terraform apply is applied correctly but, when I go to the VPC SC perimeter, I don't see the ingress/egress rules added.

I think, that parallel adding of ingress/egress rules on the same perimeter is not working well...

Can you check please?

This is very critical for my client.
We have a doubt about the stability of the VPC SC terraform resources which is very critical and can lead to production incidents.

Thanks for your help!

@Charlesleonius
Copy link

Created GoogleCloudPlatform/magic-modules#12572 to address this

@Charlesleonius
Copy link

I observed another random bug on the same resources:

  • google_access_context_manager_ingress_policy
  • google_access_context_manager_egress_policy
  • google_access_context_manager_service_perimeter_dry_run_ingress_policy
  • google_access_context_manager_service_perimeter_dry_run_egress_policy

I have multiple pipelines (more than 100), that are executed in parallel, adding VPC SC ingress/egress rules on the same VPC SC perimeter. Sometimes, the terraform apply is applied correctly but, when I go to the VPC SC perimeter, I don't see the ingress/egress rules added.

I think, that parallel adding of ingress/egress rules on the same perimeter is not working well...

Can you check please?

This is very critical for my client. We have a doubt about the stability of the VPC SC terraform resources which is very critical and can lead to production incidents.

Thanks for your help!

If the resources are in different terraform configurations then there is nothing stopping update calls from happening at the same time so the state may not be updated in the other TF config when it tries to apply its own updates. We are working on adding etags which will help prevent this case by failing the updates if they are not against the latest version of the resources.

@roaks3
Copy link
Collaborator

roaks3 commented Dec 18, 2024

Assigned to @Charlesleonius because they appear to have a solution, but let me know if you still need support and I can pass to the oncall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants