Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on operator startup in a v1.27 cluster #1774

Closed
Xide opened this issue Sep 13, 2023 · 6 comments · Fixed by #1775
Closed

Panic on operator startup in a v1.27 cluster #1774

Xide opened this issue Sep 13, 2023 · 6 comments · Fixed by #1775
Assignees
Labels
community dependencies Pull requests that update a dependency file

Comments

@Xide
Copy link

Xide commented Sep 13, 2023

Minio operator panic when started on a v1.27.3 cluster.

Expected Behavior

Minio operator should start without error.

Current Behavior

The pod immediatly crashes due to a segfault.

I0913 13:22:57.700458       1 controller.go:71] Starting MinIO Operator
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1550190]

goroutine 1 [running]:
k8s.io/client-go/discovery.convertAPIResource(...)
	k8s.io/[email protected]/discovery/aggregated_discovery.go:88
k8s.io/client-go/discovery.convertAPIGroup({{{0x0, 0x0}, {0x0, 0x0}}, {{0xc00005ce70, 0x15}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	k8s.io/[email protected]/discovery/aggregated_discovery.go:69 +0x530
k8s.io/client-go/discovery.SplitGroupsAndResources({{{0xc0001dec18, 0x15}, {0xc00069e700, 0x1b}}, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...})
	k8s.io/[email protected]/discovery/aggregated_discovery.go:35 +0x118
k8s.io/client-go/discovery.(*DiscoveryClient).downloadAPIs(0x2471980?)
	k8s.io/[email protected]/discovery/discovery_client.go:310 +0x445
k8s.io/client-go/discovery.(*DiscoveryClient).GroupsAndMaybeResources(0x155416e?)
	k8s.io/[email protected]/discovery/discovery_client.go:198 +0x50
k8s.io/client-go/discovery.ServerGroupsAndResources({0x41ad9e8, 0xc000590450})
	k8s.io/[email protected]/discovery/discovery_client.go:392 +0x4e
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources.func1()
	k8s.io/[email protected]/discovery/discovery_client.go:356 +0x1e
k8s.io/client-go/discovery.withRetries(0x2, 0xc000b52ec8)
	k8s.io/[email protected]/discovery/discovery_client.go:621 +0x69
k8s.io/client-go/discovery.(*DiscoveryClient).ServerGroupsAndResources(0x0?)
	k8s.io/[email protected]/discovery/discovery_client.go:355 +0x34
k8s.io/client-go/restmapper.GetAPIGroupResources({0x41ad9e8?, 0xc000590450?})
	k8s.io/[email protected]/restmapper/discovery.go:148 +0x36
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper.func1()
	sigs.k8s.io/[email protected]/pkg/client/apiutil/dynamicrestmapper.go:86 +0x1e
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).setStaticMapper(...)
	sigs.k8s.io/[email protected]/pkg/client/apiutil/dynamicrestmapper.go:119
sigs.k8s.io/controller-runtime/pkg/client/apiutil.NewDynamicRESTMapper(0xc00099ab40?, {0x0, 0x0, 0x2602ff9?})
	sigs.k8s.io/[email protected]/pkg/client/apiutil/dynamicrestmapper.go:99 +0x16b
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc00099ab40?, {0x0?, {0x0?, 0x0?}, {0x0?, 0x0?}})
	sigs.k8s.io/[email protected]/pkg/client/client.go:108 +0x1bc
sigs.k8s.io/controller-runtime/pkg/client.New(...)
	sigs.k8s.io/[email protected]/pkg/client/client.go:76
github.com/minio/operator/pkg/controller.StartOperator({0x0, 0x0})
	github.com/minio/operator/pkg/controller/controller.go:92 +0x205
main.startController(0x1?)
	github.com/minio/operator/cmd/operator/controller.go:37 +0x2b
github.com/minio/cli.HandleAction({0x20f2480?, 0x3f0a240?}, 0xa?)
	github.com/minio/[email protected]/app.go:492 +0x70
github.com/minio/cli.Command.Run({{0x25ebc57, 0xa}, {0x0, 0x0}, {0x54acc80, 0x1, 0x1}, {0x2620d3f, 0x1f}, {0x0, ...}, ...}, ...)
	github.com/minio/[email protected]/command.go:242 +0x965
github.com/minio/cli.(*App).Run(0xc0004536c0, {0xc00015a000, 0x2, 0x2})
	github.com/minio/[email protected]/app.go:260 +0xaa7
main.main()
	github.com/minio/operator/cmd/operator/main.go:127 +0x5e

Possible Solution

I believe this is related to this issue, which is already fixed on k8s.io/client-go >= 0.26.4.
In the issue linked above, the maintainers advise for a client-go bump in v0.26.8
However, the version currently used by Minio operator is v0.26.1

Steps to Reproduce (for bugs)

  1. (prerequisite) 1.27 cluster OR 1.26 and feature gate enabled --feature-gates=AggregatedDiscoveryEndpoint=true
  2. Install prometheus-adapter (this will register the v1beta1.metrics.k8s.io api service, which will only return empty values because no prometheus upstream is configured)
  3. Install minio operator 5.0.9 using Helm (default values)
  4. Observe operator pods in CrashLoopBackoff

Context

Regression

No

Your Environment

  • Version used (minio-operator): 5.0.9
  • Environment name and version (e.g. kubernetes v1.17.2): Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
  • Server type and version: DELL R620
  • Operating System and version (uname -a): Linux ~hostname~ 6.3.8-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 15 01:51:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux. (Fedora)
  • Link to your deployment file: Official helm chart with default values

EDIT: reproduction details

@jiuker jiuker self-assigned this Sep 13, 2023
@jiuker jiuker added dependencies Pull requests that update a dependency file and removed triage labels Sep 13, 2023
@jiuker
Copy link
Contributor

jiuker commented Sep 13, 2023

@Xide If this feature is disable. Will it work?

@Xide
Copy link
Author

Xide commented Sep 13, 2023

@jiuker I do not have a prod 1.26 cluster to test on, but using a 1.26.4 kind cluster with the flag disabled, everything works properly.

Commands:

kind create cluster --image  kindest/node:v1.26.4
helm repo add minio https://operator.min.io/
helm repo update
helm install \                                               
  --namespace minio-operator \
  --create-namespace \
  minio-operator minio/operator

Operator logs:

I0913 15:16:02.091209       1 controller.go:71] Starting MinIO Operator
I0913 15:16:02.104032       1 main-controller.go:278] Setting up event handlers
I0913 15:16:02.111674       1 main-controller.go:502] Using Kubernetes CSR Version: v1
I0913 15:16:02.111705       1 main-controller.go:522] STS Api server is not enabled, not starting
I0913 15:16:02.111743       1 leaderelection.go:248] attempting to acquire leader lease minio-operator/minio-operator-lock...
I0913 15:16:02.113600       1 main-controller.go:569] minio-operator-764c4c765-5l7gs: is the leader, removing any leader labels that I 'minio-operator-764c4c765-9spn7' might have

@jiuker
Copy link
Contributor

jiuker commented Sep 13, 2023

@Xide Thanks for your reply, it will help us a lot.

@Xide
Copy link
Author

Xide commented Sep 13, 2023

Some notes for reproduction :

The bug does not occurs on kind 1.27.3 nor 1.26.4 with feature flag enabled.
With some more researches, i found that my cluster has a apiservices.apiregistration.k8s.io for v1beta1.metrics.k8s.io, but the endpoint doesn't works correctly and return an empty value, which is propagated to the go-client and cause a null pointer dereference.

I can confirm that removing this api registration fixed the segfault on my main cluster. If you want to reproduce this for validation, installing a dummy prometheus-adapter with no upstream / invalid queries will help trigger the bug.

@clouedoc
Copy link

clouedoc commented Oct 2, 2023

For information, the bugfix is not yet in upstream release.

As I'm writing this, the current release is 5.0.9.

If you want to install the Minio Operator and happen to have a buggy prometheus-adapter installation (or KEDA?), then, you'll need to make a fork.

  1. Make a fork of this repo
  2. Add a random tag (git tag v9.9.9)
  3. Update the Makefile so that the image name points to your fork (ghcr.io/<your username>/operator)
  4. Do make docker
  5. Wait for it to compile
  6. Get the image name, do docker push <your image name that starts with ghcr.io>
  7. Add a service account to your package( = docker image) that has been pushed to GitHub
  8. Add the credentials of the service account to the minio-operator namespace
  9. Update the operator deployment to use your pushed image, that contains the latest commmit and bugfix

Good luck, fellow Kubernetes human operator...

@jiuker
Copy link
Contributor

jiuker commented Oct 2, 2023

For information, the bugfix is not yet in upstream release.

As I'm writing this, the current release is 5.0.9.

If you want to install the Minio Operator and happen to have a buggy prometheus-adapter installation (or KEDA?), then, you'll need to make a fork.

  1. Make a fork of this repo
  2. Add a random tag (git tag v9.9.9)
  3. Update the Makefile so that the image name points to your fork (ghcr.io/<your username>/operator)
  4. Do make docker
  5. Wait for it to compile
  6. Get the image name, do docker push <your image name that starts with ghcr.io>
  7. Add a service account to your package( = docker image) that has been pushed to GitHub
  8. Add the credentials of the service account to the minio-operator namespace
  9. Update the operator deployment to use your pushed image, that contains the latest commmit and bugfix

Good luck, fellow Kubernetes human operator...

could you share the log? @clouedoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants