Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin panics while destroying instance group manager #14516

Closed
Assignees
Labels

Comments

@nicolaferraro
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

v1.2.9

Affected Resource(s)

  • google_compute_instance_group_manager

Terraform Configuration Files

Panic happens on terraform destroy -refresh=false when the resource is not found.

variable "project_id" {
  type = string
}

variable "region" {
  type = string
}

variable "availability_zone" {
  type = string
}

provider "google" {
  project = var.project_id
  region  = var.region
}

locals {
  vm_user_data = {
    users = [
      "default",
      {
        name          = "myself",
        gecos         = "myself",
        primary_group = "myself",
        sudo          = "ALL=(ALL) NOPASSWD:ALL",
        shell         = "/bin/bash",
        groups        = "users,adm",
        lock_passwd   = false,
      }
    ],
    package_upgrade = true,
    runcmd          = [
      ["echo", "--- booted ---"],
    ]
  }
  vm_user_data_with_cloud_config_directive = "#cloud-config\n${jsonencode(local.vm_user_data)}"
}

resource "google_project_service" "compute_api" {
  service                    = "compute.googleapis.com"
  disable_dependent_services = false
  disable_on_destroy         = false
}

resource "google_service_account" "myservice_agent" {
  account_id = "myserviceaccount"
}

resource "google_compute_network" "myservice" {
  name                            = "myvpc"
  auto_create_subnetworks         = true
  delete_default_routes_on_create = false
  routing_mode                    = "GLOBAL"

  depends_on = [
    google_project_service.compute_api,
  ]
}

resource "google_compute_instance_template" "myservice_agent" {
  name_prefix  = "agent-"
  machine_type = "e2-medium"
  metadata     = {
    user-data = local.vm_user_data_with_cloud_config_directive
  }

  disk {
    disk_size_gb = 16
    disk_type    = "pd-balanced"
    auto_delete  = "true"
    boot         = "true"
    source_image = "ubuntu-os-cloud/ubuntu-2204-lts"
  }

  service_account {
    email  = google_service_account.myservice_agent.email
    scopes = ["cloud-platform"]
  }

  network_interface {
    network = google_compute_network.myservice.name
    access_config {
      network_tier = "STANDARD"
    }
  }

  scheduling {
    automatic_restart   = "true"
    on_host_maintenance = "MIGRATE"
  }

  shielded_instance_config {
    enable_secure_boot          = "true"
    enable_vtpm                 = "true"
    enable_integrity_monitoring = "true"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_instance_group_manager" "myservice_agent" {
  name               = "mygroup"
  base_instance_name = "myinstance"
  zone               = var.availability_zone

  version {
    name              = "myversion"
    instance_template = google_compute_instance_template.myservice_agent.id
  }

  target_size        = 1
  wait_for_instances = "true"

  update_policy {
    max_surge_fixed       = 1
    max_unavailable_fixed = 1
    minimal_action        = "REPLACE"
    replacement_method    = "SUBSTITUTE"
    type                  = "PROACTIVE"
  }
}

Debug Output

https://gist.github.com/nicolaferraro/5fa20c6fe3058ea5ec37234c734a1a04

Panic Output

╷
│ Error: Plugin did not respond
│ 
│ The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain more details.
╵

Stack trace from the terraform-provider-google_v4.63.1_x5 plugin:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x2270b29]

goroutine 219 [running]:
github.com/hashicorp/terraform-provider-google/google.waitForInstancesRefreshFunc.func1()
        github.com/hashicorp/terraform-provider-google/google/resource_compute_region_instance_group_manager.go:515 +0xa9
github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext.func1()
        github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/resource/state.go:110 +0x207
created by github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext
        github.com/hashicorp/terraform-plugin-sdk/[email protected]/helper/resource/state.go:83 +0x1d8

Error: The terraform-provider-google_v4.63.1_x5 plugin crashed!

This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.

2023-05-08T10:04:34.167+0200 [DEBUG] provider: plugin exited

Expected Behavior

Should not panic and not fail because the resource is already deleted.

Actual Behavior

Panic.

Steps to Reproduce

  1. terraform apply
  2. Delete the instance group manually from gcloud console (this may be deleted as part of a previous run, but the tf state is out of sync)
  3. terraform destroy -refresh=false
@megan07
Copy link
Contributor

megan07 commented May 8, 2023

Hi @nicolaferraro, so sorry you're running into this issue! I think I see where the problem is, and I can put in a fix for this, but for documentation purposes I want to do my own stack-trace here.

The panic comes here because m, the InstanceManager, is nil.

To get to that point, we start in Delete, and computeRIGMWaitForInstanceStatus is called when wait_for_instances is set.

This bring us into that function, where the function getRegionalManager is passed to waitForInstancesRefreshFunc.

You'll see in waitForInstancesRefreshFunc that f is called, that is the function passed in, or, getRegionalManager.

So we go there and see the call to Get the InstanceManager. This returns a 404 error because it no longer exists and thus handleNotFoundError is called. Now, when reading a resource and we see that a resource has been deleted outside of Terraform, we typically just remove it from state (d.SetId("")) and return a nil error, and since this getRegionalManager function is used both for reading and deleting, when we return back to delete, the error being returned is nil, but so is the InstanceManager.

I'll fix this so we check m == nil to account for this case, and if it is, we'll assume it doesn't exist and we can just move on.

It's not the same exact problem as this issue, but its similar in the sense that we need to be careful how we're reusing the handleNotFoundError function.

(and sorry, I realized after the fact that my stack trace above is with the regional instance manager, but it applies to the instance manager as well)

@github-actions
Copy link

github-actions bot commented Jun 9, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 9, 2023
@github-actions github-actions bot added forward/review In review; remove label to forward service/compute-managed labels Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.