Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inventory Operator Crash Loop Due to nil pointer dereference After Rollout Restart on provider pdx.nb.akash.pub #268

Closed
devalpatel67 opened this issue Dec 3, 2024 · 0 comments · Fixed by akash-network/provider#258 or akash-network/provider#261

Comments

@devalpatel67
Copy link

Description
The provider's inventory operator was reporting large/incorrect numbers of CPUs (an issue already tracked here). After attempting to resolve this by cleaning up failed pods and performing a rollout restart, the operator-inventory pod entered a crash loop caused by a nil pointer dereference error.

image

Steps which led to issue

  1. Clean up failed pods in the cluster using the following commands:
kubectl get pods -A --field-selector status.phase=Failed
kubectl delete pods -A --field-selector status.phase=Failed
  1. Restart the operator-inventory deployment to refresh the inventory:
kubectl rollout restart deployment operator-inventory -n akash-services
  1. Observe the newly created pod for the operator-inventory deployment.

Observed Behavior

  • The new operator-inventory pod enters a crash loop.
  • The pod logs show the following panic error:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x25d43d4]

goroutine 164 [running]:
github.com/akash-network/provider/operator/inventory.(*rancher).run(0xc001796390, 0xc0002fc540)
    github.com/akash-network/provider/operator/inventory/rancher.go:226 +0x1ab4
github.com/akash-network/provider/operator/inventory.NewRancher.func1()
    github.com/akash-network/provider/operator/inventory/rancher.go:63 +0x1b
golang.org/x/sync/errgroup.(*Group).Go.func1()
    golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x96

The provider's inventory operator was reporting large/incorrect number of CPUs(there is an issue open for it already here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants