Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO: unclean shutdown of envtest e2e tests #429

Open
jiridanek opened this issue Oct 22, 2024 · 6 comments
Open

TODO: unclean shutdown of envtest e2e tests #429

jiridanek opened this issue Oct 22, 2024 · 6 comments

Comments

@jiridanek
Copy link
Member

jiridanek commented Oct 22, 2024

Harmless, but annoying

2024-10-22T13:38:47+02:00	INFO	Stopping and waiting for webhooks
2024-10-22T13:38:47+02:00	INFO	controller-runtime.webhook	Shutting down webhook server with timeout of 1 minute
2024-10-22T13:38:47+02:00	INFO	Stopping and waiting for HTTP servers
2024-10-22T13:38:47+02:00	INFO	Wait completed, proceeding to shutdown the manager
2024-10-22T13:38:47+02:00	DEBUG	controller-runtime.certwatcher	certificate event	{"event": "REMOVE        \"/var/folders/f1/3m518k5d34l72v_9nqyjzqm80000gn/T/envtest-serving-certs-2153001584/tls.key\""}
2024-10-22T13:38:47+02:00	ERROR	controller-runtime.certwatcher	error re-watching file	{"error": "fsnotify: watcher already closed"}
sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).handleEvent
	/Users/jdanek/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/certwatcher/certwatcher.go:185
sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).Watch
	/Users/jdanek/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/certwatcher/certwatcher.go:133
2024-10-22T13:38:47+02:00	ERROR	controller-runtime.certwatcher	error re-reading certificate	{"error": "open /var/folders/f1/3m518k5d34l72v_9nqyjzqm80000gn/T/envtest-serving-certs-2153001584/tls.crt: no such file or directory"}
sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).handleEvent
	/Users/jdanek/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/certwatcher/certwatcher.go:190
sigs.k8s.io/controller-runtime/pkg/certwatcher.(*CertWatcher).Watch
	/Users/jdanek/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/certwatcher/certwatcher.go:133


Ran 2 of 2 Specs in 10.003 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 0 Skipped

This is something that people in general struggled with

and we have a TODO comment for it

https://github.com/opendatahub-io/kubeflow/blob/0743e0e444f236b75d90eb2053ac0e79ead5950f/components/odh-notebook-controller/controllers/suite_test.go#L189-L195

The solution seems to be to create a nested context for

subctx, cancel := context.WithCancel(ctx)

...

	// Start the manager
	go func() {
		defer GinkgoRecover()
		err := mgr.Start(ctx)
		Expect(err).ToNot(HaveOccurred(), "Failed to run manager")
	}()

...

// when we're done testing
cancel()

but it needs to be checked.

@jiridanek
Copy link
Member Author

^^ that nested context does not seem to be good enough, I still see unclean shutdowns

@jiridanek
Copy link
Member Author

jiridanek commented Nov 17, 2024

resolved by suppressing DEBUG logs

@jiridanek
Copy link
Member Author

I like the suggestion of using

// Set the GracefulShutdownTimeout to 0 to prevent errors:
// failed waiting for all runnables to end within grace period of 30s: context deadline exceeded
// Ref: https://github.com/kubernetes-sigs/controller-runtime/blob/e59161ee/pkg/manager/internal.go#L543-L548
cfg.GracefulShutdownTimeout = lo.ToPtr(time.Duration(0))

@jiridanek
Copy link
Member Author

The webhook timeout cannot be changed

idleConnsClosed := make(chan struct{})
go func() {
	<-ctx.Done()
	log.Info("Shutting down webhook server with timeout of 1 minute")

	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)
	defer cancel()
	if err := srv.Shutdown(ctx); err != nil {
		// Error from closing listeners, or context timeout
		log.Error(err, "error shutting down the HTTP server")
	}
	close(idleConnsClosed)
}()

@jiridanek
Copy link
Member Author

One more advice, similarly to the previous, I found one other likely option in the struct docs

	BaseContext: func() context.Context {
		return ctx
	},

@jiridanek
Copy link
Member Author

Waiting for

err = mgr.Start(ctx)

to return (after ctx is cancelled) and only then calling

err := envTest.Stop()

seems to be effective against

2024-11-17T20:19:57+01:00       ERROR   controller-runtime.certwatcher  error re-reading certificate    {"error": "open /var/folders/f1/3m518k5d34l72v_9nqyjzqm80000gn/T/envtest-serving-certs-737480808/tls.crt: no such file or directory"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant