Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.5.7 Server pods are crashing after upgrade #13140

Closed
3 of 4 tasks
vermaxik opened this issue Jun 3, 2024 · 12 comments · Fixed by #13166
Closed
3 of 4 tasks

3.5.7 Server pods are crashing after upgrade #13140

vermaxik opened this issue Jun 3, 2024 · 12 comments · Fixed by #13166
Assignees
Labels
area/api Argo Server API area/server P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@vermaxik
Copy link

vermaxik commented Jun 3, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

After upgrading from 3.5.5 to 3.5.7 argo server pods crashed with two types of errors

Recovered from panic: runtime error: invalid memory address or nil pointer dereference

edited by agilgur5: reformatted to make it more readable

{"level":"error","msg":"
Recovered from panic: runtime error: invalid memory address or nil pointer dereference
goroutine 94056 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x5e
github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4.1()
	/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:22 +0x4e
panic({0x28f9d80?, 0x54d4f70?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
//                       extra new line added by agilgur5 for readability
modernc.org/sqlite/lib._sqlite3OsRead(...)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:16239
modernc.org/sqlite/lib._read32bits(0xc0007f8dc0, 0x7fc15b2006a0, 0xc0007f8dc0?, 0x7fc15ba00888)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:35366 +0x82
modernc.org/sqlite/lib._readJournalHdr(0xc0007f8dc0, 0x7fc15b200428, 0x0, 0xf278, 0x554ced0?, 0xc00103ba98?)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:35847 +0x1c7
modernc.org/sqlite/lib._pager_playback(0xc0007f8dc0, 0x7fc15b200428, 0x0)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:37114 +0x437
modernc.org/sqlite/lib._sqlite3PagerRollback(0xc0007f8dc0?, 0x7fc15b200428)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:40864 +0x65
modernc.org/sqlite/lib._sqlite3BtreeRollback(0xc0007f8dc0, 0x7fc15b600228, 0x0, 0x1)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:51547 +0x125
modernc.org/sqlite/lib._sqlite3RollbackAll(0x196119b?, 0x7fc15b200028, 0x5b200028?)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:164759 +0xd9
modernc.org/sqlite/lib._sqlite3VdbeHalt(0x7fc15ba007e0?, 0x7fc15af05c28)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:64500 +0x3bd
modernc.org/sqlite/lib._sqlite3VdbeExec(0xc0007f8dc0, 0x7fc15af05c28)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:77403 +0x545
modernc.org/sqlite/lib._sqlite3Step(0xc0007f8dc0?, 0x7fc15af05c28)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67274 +0x6c
modernc.org/sqlite/lib.Xsqlite3_step(0x2876300?, 0x7fc15af05c28)
	/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67339 +0xab
//                       extra new line added by agilgur5 for readability
zombiezen.com/go/sqlite.(*Stmt).step(0xc000a01bc0)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc000a01bc0)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc000a01bc0, 0x3, 0xc00103c860)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0x2d769c0?, {0xc000bce820?, 0xc001258e10?}, 0x12?)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:123 +0x45
//                       extra new line added by agilgur5 for readability
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).ListWorkflows(0xc000700a38, {0xc000589360?, 0x0?}, {0xc001258e10?, 0x0?}, {0x0?, 0x0?}, {{{0x0, 0x0}, {0x0, ...}}, ...})
	/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:105 +0x2ab
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).ListWorkflows(0xc000c3f620, {0x3d0fdd0, 0xc001713110}, 0xc00018b270)
	/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:202 +0x3d2
github.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler.func1({0x3d0fdd0, 0xc001713110}, {0x2c43be0?, 0xc00018b270})
	/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1826 +0x72
github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.RatelimitUnaryServerInterceptor.func5({0x3d0fdd0, 0xc001713110}, {0x2c43be0, 0xc00018b270}, 0xc0002f5760, 0xc0015b47b0)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:65 +0x133\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001713110?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/auth.(*gatekeeper).UnaryServerInterceptor.func1({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0, 0xc00018b270}, 0xc000819230?, 0xc0002f57a0)\n\t/go/src/github.com/argoproj/argo-workflows/server/auth/gatekeeper.go:98 +0x63\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/util/grpc.glob..func1({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0?, 0xc00018b270?}, 0x2d8a829?, 0xf?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:45 +0x2a\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0?, 0xc00018b270?}, 0x0?, 0x0?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:26 +0x8c\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc0013d18f0?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/logrus.UnaryServerInterceptor.func1({0x3d0fdd0, 0xc0013d1740}, {0x2c43be0, 0xc00018b270}, 0xc0002f5760, 0xc0002f5820)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/logging/logrus/server_interceptors.go:31 +0xf6\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc0013d1740?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func3({0x3d0fdd0, 0xc0013d1740}, {0x2c43be0, 0xc00018b270}, 0x29d5e00?, 0xc0002f5860)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0x83\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc0013d1740?}, {0x2c43be0?, 0xc00018b270?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6({0x3d0fdd0, 0xc0013d1740}, {0x2c43be0, 0xc00018b270}, 0xc0000429f8?, 0x28fb280?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xb5\ngithub.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler({0x2ca24a0?, 0xc000c3f620}, {0x3d0fdd0, 0xc0013d1740}, 0xc0015c5b80, 0xc00093b710)\n\t/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1828 +0x135\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc00097e1e0, {0x3d0fdd0, 0xc0013d16b0}, {0x3d1cf40, 0xc0026611e0}, 0xc0005fbe60, 0xc00093bb00, 0x54f3fb0, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1343 +0xe03\ngoogle.golang.org/grpc.(*Server).handleStream(0xc00097e1e0, {0x3d1cf40, 0xc0026611e0}, 0xc0005fbe60)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xc4c\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1()\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:986 +0x86\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 236\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:997 +0x145\n
//                       extra new line added by agilgur5 for readability
", "time":"2024-05-29T07:23:45.611Z"}

Recovered from panic: runtime error: index out of range [70437463654405] with length 64

edited by agilgur5: reformatted to make it more readable

{"level":"error","msg":"
Recovered from panic: runtime error: index out of range [70437463654405] with length 64
goroutine 69107 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4.1()\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:22 +0x4e\npanic({0x2b8f600?, 0xc0018d4000?})\n\t/usr/local/go/src/runtime/panic.go:914 +0x21f
//                       extra new line added by agilgur5 for readability
modernc.org/memory.(*Allocator).UintptrFree(0xc001c857c8?, 0xc001c85808?)\n\t/go/pkg/mod/modernc.org/[email protected]/memory.go:201 +0x138\nmodernc.org/libc.Xfree(0x1902b00?, 0x5518be0?)\n\t/go/pkg/mod/modernc.org/[email protected]/mem.go:98 +0x8e\nmodernc.org/sqlite/lib._sqlite3MemFree(0x1903756?, 0xc0005cc760?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:16853 +0x17\nmodernc.org/sqlite/lib.Xsqlite3_free(0xc000870780?, 0x1927cd0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:17828 +0xb8\nmodernc.org/sqlite/lib._pcache1ResizeHash(0x7f08ee2ce488?, 0x7f08ef7ff8a8?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33108 +0x145\nmodernc.org/sqlite/lib._pcache1FetchStage2(0x1903756?, 0x7f08ef7ff8a8, 0x101, 0x1)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33469 +0xb6\nmodernc.org/sqlite/lib._pcache1FetchNoMutex(0xc000870780?, 0xc001c85930?, 0x19268d5?, 0x0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33594 +0xac\nmodernc.org/sqlite/lib._pcache1FetchWithMutex(0x7f08edfdf0b0?, 0x7f08ef7ff8a8, 0x1c85960?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33607 +0x59\nmodernc.org/sqlite/lib._pcache1Fetch(0x19333c5?, 0x100?, 0x870780?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33624 +0x1f\nmodernc.org/sqlite/lib._sqlite3PcacheFetch(0xc0005cc760?, 0x7f08ef3fec28?, 0x100?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:31810 +0x2a\nmodernc.org/sqlite/lib._getPageNormal(0xc000870780, 0x7f08ef3fec28, 0x101, 0x7f08efb00270, 0x1)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:39726 +0xa8\nmodernc.org/sqlite/lib._sqlite3PagerGet(...)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:39889\nmodernc.org/sqlite/lib._btreeGetPage(0xc000870780, 0x7f08eff2b028, 0x101, 0x7f08efb00248, 0x870780?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:49324 +0xa6\nmodernc.org/sqlite/lib._btreeGetUnusedPage(0xc000870780?, 0x7f08edcfc044?, 0x1916c94?, 0x7f08efb00248, 0x1c85ac0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:49452 +0x25\nmodernc.org/sqlite/lib._allocateBtreePage(0xc000870780, 0x7f08eff2b028, 0x7f08efb00248, 0x7f08efb00244, 0x100, 0x0)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:53819 +0x489\nmodernc.org/sqlite/lib._fillInCell(0xc000870780, 0x7f08edf7d0b0, 0x7f08edcf802c, 0x7f08efafff68, 0x7f08efb001a8)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:54224 +0x615\nmodernc.org/sqlite/lib._sqlite3BtreeInsert(0xc000870780, 0x7f08eef07140, 0x7f08efafff68, 0x0, 0x0)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:56763 +0x68e\nmodernc.org/sqlite/lib._sqlite3VdbeExec(0xc000870780, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:75229 +0x98ab\nmodernc.org/sqlite/lib._sqlite3Step(0xc000870780?, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67274 +0x6c\nmodernc.org/sqlite/lib.Xsqlite3_step(0x2876300?, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67339 +0xab
//                       extra new line added by agilgur5 for readability
zombiezen.com/go/sqlite.(*Stmt).step(0xc0007020c0)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc0007020c0)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc0007020c0, 0x3, 0xc001c86860)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0x2d769c0?, {0xc00100a4e0?, 0xc00147ec90?}, 0x12?)
	/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:123 +0x45
//                       extra new line added by agilgur5 for readability
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).ListWorkflows(0xc0009c3698, {0xc0006aea00?, 0x0?}, {0xc00147ec90?, 0x0?}, {0x0?, 0x0?}, {{{0x0, 0x0}, {0x0, ...}}, ...})
	/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:105 +0x2ab
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).ListWorkflows(0xc000921e60, {0x3d0fdd0, 0xc001306600}, 0xc0013e4e60)
	/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:202 +0x3d2
github.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler.func1({0x3d0fdd0, 0xc001306600}, {0x2c43be0?, 0xc0013e4e60})
	/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1826 +0x72
github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.RatelimitUnaryServerInterceptor.func5({0x3d0fdd0, 0xc001306600}, {0x2c43be0, 0xc0013e4e60}, 0xc001bd8120, 0xc001b84b88)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:65 +0x133\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001306600?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/auth.(*gatekeeper).UnaryServerInterceptor.func1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0, 0xc0013e4e60}, 0xc000bdd830?, 0xc001bd8140)\n\t/go/src/github.com/argoproj/argo-workflows/server/auth/gatekeeper.go:98 +0x63\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/util/grpc.glob..func1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?}, 0x2d8a829?, 0xf?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:45 +0x2a\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?}, 0x0?, 0x0?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:26 +0x8c\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/logrus.UnaryServerInterceptor.func1({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0xc001bd8120, 0xc001bd81a0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/logging/logrus/server_interceptors.go:31 +0xf6\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001843f50?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func3({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0x29d5e00?, 0xc001bd81c0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0x83\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001843f50?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0xc0009609f8?, 0x28fb280?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xb5\ngithub.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler({0x2ca24a0?, 0xc000921e60}, {0x3d0fdd0, 0xc001843f50}, 0xc000d34800, 0xc000706ab0)\n\t/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1828 +0x135\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc0001521e0, {0x3d0fdd0, 0xc001843e60}, {0x3d1cf40, 0xc000740b60}, 0xc00139f560, 0xc0007076e0, 0x54f3fb0, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1343 +0xe03\ngoogle.golang.org/grpc.(*Server).handleStream(0xc0001521e0, {0x3d1cf40, 0xc000740b60}, 0xc00139f560)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xc4c\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1()\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:986 +0x86\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 296\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:997 +0x145
//                       extra new line added by agilgur5 for readability
","time":"2024-05-29T07:47:53.387Z"}

Screenshot 2024-06-03 at 10 29 29

edited by Joibel and agilgur5: The below split into a separate issue - see #13149

and argo controllers are flooded with

{"level":"warning","msg":"Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \"workflow-6gndn\": the object has been modified; please apply your changes to the latest version and try again Conflict","namespace":"customer-workflows","time":"2024-05-29T11:47:22.010Z","workflow":"workflow-6gndn"}
Logz io 2024-06-03 10-26-18

Version

3.5.7

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

we have dynamic load, average 1-1,5 pods per second, max 50 running in parallel, up to 100 in pending status, workflows are persisted in postgres database.

Here is part of our config:

parallelism: 50
resourceRateLimit:
  limit: 100
  burst: 3
mainContainer:
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: 0.3
      memory: 256M
    requests:
      cpu: 0.1
      memory: 128M
executor:
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: 0.3
      memory: 256M
    requests:
      cpu: 0.1
      memory: 128M
artifactRepository:
  archiveLogs: true
  s3:
    bucket: argo-wf
    endpoint: s3.eu-west-1.amazonaws.com
    keyFormat: "{{workflow.labels.customer-id}}/{{workflow.name}}/{{pod.name}}"
    region: eu-west-1
persistence:
  archive: true
  nodeStatusOffLoad: true
  connectionPool:
    connMaxLifetime: 300s
  postgresql:
    host: argo-rds
    port: 5432
    database: argo_production
    tableName: argo_workflows
    userNameSecret:
      key: username
      name: db-secret
    passwordSecret:
      key: password
      name: db-secret

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@vermaxik vermaxik added type/bug type/regression Regression from previous behavior (a specific type of bug) labels Jun 3, 2024
@Joibel
Copy link
Member

Joibel commented Jun 3, 2024

@vermaxik - the second part with the Operation cannot be fulfilled on error logs is coming from your workflow-controller pod rather than your argo-server? If so, that should be a separate issue as it's almost certainly a different problem.

Link for a full stack trace from argo-server for the invalid memory address panic: https://cloud-native.slack.com/archives/C01QW9QSSSK/p1717081687871169

Can you also add a stack trace for the index out of range here please.

@Joibel Joibel self-assigned this Jun 3, 2024
@vermaxik
Copy link
Author

vermaxik commented Jun 3, 2024

@Joibel thank you, yes it's coming from controller, but somehow related to this, it's dropped after rollback 🤔

The stack race in the message (click to expand), but I can also add it here:

 {"level":"error","msg":"Recovered from panic: runtime error: index out of range [70437463654405] with length 64\ngoroutine 69107 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4.1()\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:22 +0x4e\npanic({0x2b8f600?, 0xc0018d4000?})\n\t/usr/local/go/src/runtime/panic.go:914 +0x21f\nmodernc.org/memory.(*Allocator).UintptrFree(0xc001c857c8?, 0xc001c85808?)\n\t/go/pkg/mod/modernc.org/[email protected]/memory.go:201 +0x138\nmodernc.org/libc.Xfree(0x1902b00?, 0x5518be0?)\n\t/go/pkg/mod/modernc.org/[email protected]/mem.go:98 +0x8e\nmodernc.org/sqlite/lib._sqlite3MemFree(0x1903756?, 0xc0005cc760?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:16853 +0x17\nmodernc.org/sqlite/lib.Xsqlite3_free(0xc000870780?, 0x1927cd0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:17828 +0xb8\nmodernc.org/sqlite/lib._pcache1ResizeHash(0x7f08ee2ce488?, 0x7f08ef7ff8a8?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33108 +0x145\nmodernc.org/sqlite/lib._pcache1FetchStage2(0x1903756?, 0x7f08ef7ff8a8, 0x101, 0x1)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33469 +0xb6\nmodernc.org/sqlite/lib._pcache1FetchNoMutex(0xc000870780?, 0xc001c85930?, 0x19268d5?, 0x0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33594 +0xac\nmodernc.org/sqlite/lib._pcache1FetchWithMutex(0x7f08edfdf0b0?, 0x7f08ef7ff8a8, 0x1c85960?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33607 +0x59\nmodernc.org/sqlite/lib._pcache1Fetch(0x19333c5?, 0x100?, 0x870780?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:33624 +0x1f\nmodernc.org/sqlite/lib._sqlite3PcacheFetch(0xc0005cc760?, 0x7f08ef3fec28?, 0x100?, 0xc0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:31810 +0x2a\nmodernc.org/sqlite/lib._getPageNormal(0xc000870780, 0x7f08ef3fec28, 0x101, 0x7f08efb00270, 0x1)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:39726 +0xa8\nmodernc.org/sqlite/lib._sqlite3PagerGet(...)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:39889\nmodernc.org/sqlite/lib._btreeGetPage(0xc000870780, 0x7f08eff2b028, 0x101, 0x7f08efb00248, 0x870780?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:49324 +0xa6\nmodernc.org/sqlite/lib._btreeGetUnusedPage(0xc000870780?, 0x7f08edcfc044?, 0x1916c94?, 0x7f08efb00248, 0x1c85ac0?)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:49452 +0x25\nmodernc.org/sqlite/lib._allocateBtreePage(0xc000870780, 0x7f08eff2b028, 0x7f08efb00248, 0x7f08efb00244, 0x100, 0x0)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:53819 +0x489\nmodernc.org/sqlite/lib._fillInCell(0xc000870780, 0x7f08edf7d0b0, 0x7f08edcf802c, 0x7f08efafff68, 0x7f08efb001a8)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:54224 +0x615\nmodernc.org/sqlite/lib._sqlite3BtreeInsert(0xc000870780, 0x7f08eef07140, 0x7f08efafff68, 0x0, 0x0)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:56763 +0x68e\nmodernc.org/sqlite/lib._sqlite3VdbeExec(0xc000870780, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:75229 +0x98ab\nmodernc.org/sqlite/lib._sqlite3Step(0xc000870780?, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67274 +0x6c\nmodernc.org/sqlite/lib.Xsqlite3_step(0x2876300?, 0x7f08ef005428)\n\t/go/pkg/mod/modernc.org/[email protected]/lib/sqlite_linux_amd64.go:67339 +0xab\nzombiezen.com/go/sqlite.(*Stmt).step(0xc0007020c0)\n\t/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:799 +0xac\nzombiezen.com/go/sqlite.(*Stmt).Step(0xc0007020c0)\n\t/go/pkg/mod/zombiezen.com/go/[email protected]/sqlite.go:785 +0xb6\nzombiezen.com/go/sqlite/sqlitex.exec(0xc0007020c0, 0x3, 0xc001c86860)\n\t/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:293 +0x308\nzombiezen.com/go/sqlite/sqlitex.Execute(0x2d769c0?, {0xc00100a4e0?, 0xc00147ec90?}, 0x12?)\n\t/go/pkg/mod/zombiezen.com/go/[email protected]/sqlitex/exec.go:123 +0x45\ngithub.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).ListWorkflows(0xc0009c3698, {0xc0006aea00?, 0x0?}, {0xc00147ec90?, 0x0?}, {0x0?, 0x0?}, {{{0x0, 0x0}, {0x0, ...}}, ...})\n\t/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:105 +0x2ab\ngithub.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).ListWorkflows(0xc000921e60, {0x3d0fdd0, 0xc001306600}, 0xc0013e4e60)\n\t/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:202 +0x3d2\ngithub.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler.func1({0x3d0fdd0, 0xc001306600}, {0x2c43be0?, 0xc0013e4e60})\n\t/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1826 +0x72\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.RatelimitUnaryServerInterceptor.func5({0x3d0fdd0, 0xc001306600}, {0x2c43be0, 0xc0013e4e60}, 0xc001bd8120, 0xc001b84b88)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:65 +0x133\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001306600?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/auth.(*gatekeeper).UnaryServerInterceptor.func1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0, 0xc0013e4e60}, 0xc000bdd830?, 0xc001bd8140)\n\t/go/src/github.com/argoproj/argo-workflows/server/auth/gatekeeper.go:98 +0x63\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/util/grpc.glob..func1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?}, 0x2d8a829?, 0xf?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:45 +0x2a\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.PanicLoggerUnaryServerInterceptor.func4({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?}, 0x0?, 0x0?)\n\t/go/src/github.com/argoproj/argo-workflows/util/grpc/interceptor.go:26 +0x8c\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc000e6e2a0?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/logrus.UnaryServerInterceptor.func1({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0xc001bd8120, 0xc001bd81a0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/logging/logrus/server_interceptors.go:31 +0xf6\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001843f50?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).UnaryServerInterceptor.func3({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0x29d5e00?, 0xc001bd81c0)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0x83\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6.1.1({0x3d0fdd0?, 0xc001843f50?}, {0x2c43be0?, 0xc0013e4e60?})\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x37\ngithub.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).newGRPCServer.ChainUnaryServer.func6({0x3d0fdd0, 0xc001843f50}, {0x2c43be0, 0xc0013e4e60}, 0xc0009609f8?, 0x28fb280?)\n\t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xb5\ngithub.com/argoproj/argo-workflows/v3/pkg/apiclient/workflow._WorkflowService_ListWorkflows_Handler({0x2ca24a0?, 0xc000921e60}, {0x3d0fdd0, 0xc001843f50}, 0xc000d34800, 0xc000706ab0)\n\t/go/src/github.com/argoproj/argo-workflows/pkg/apiclient/workflow/workflow.pb.go:1828 +0x135\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc0001521e0, {0x3d0fdd0, 0xc001843e60}, {0x3d1cf40, 0xc000740b60}, 0xc00139f560, 0xc0007076e0, 0x54f3fb0, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1343 +0xe03\ngoogle.golang.org/grpc.(*Server).handleStream(0xc0001521e0, {0x3d1cf40, 0xc000740b60}, 0xc00139f560)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xc4c\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1()\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:986 +0x86\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 296\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:997 +0x145\n","time":"2024-05-29T07:47:53.387Z"}

it's not so many of index out of range compared to invalid memory address

@Joibel
Copy link
Member

Joibel commented Jun 3, 2024

Ah, sorry, missed the stack trace above. Thank you. It might give a clue as to what's going wrong in both cases, even if it is rarer.

@Joibel Joibel added area/controller Controller issues, panics area/server labels Jun 3, 2024
@reisei
Copy link

reisei commented Jun 3, 2024

We were able to reproduce the issue in QA environment. With the parallelism 20 there was 100 workflows in the queue. The problem occurred right after firsts workflows were being scheduled: the workflows-server pods started restarting and it was possible to catch the some of the stacktraces

google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0012cf040, 0x1?)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc0012d7f00 sp=0xc0012d7df0 pc=0xf84325
google.golang.org/grpc.(*Server).serveStreams(0xc000852000, {0x3d1cf40?, 0xc0012cf040})
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc0012d7f80 sp=0xc0012d7f00 pc=0xfd5702
google.golang.org/grpc.(*Server).handleRawConn.func1()
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc0012d7fe0 sp=0xc0012d7f80 pc=0xfd4f65
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0012d7fe8 sp=0xc0012d7fe0 pc=0x4712e1
created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 243
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185

goroutine 236 [select, 1 minutes]:
runtime.gopark(0xc001557f00?, 0x2?, 0x9?, 0x0?, 0xc001557ed4?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00164ad80 sp=0xc00164ad60 pc=0x43e26e
runtime.selectgo(0xc00164af00, 0xc001557ed0, 0xf8b416?, 0x0, 0xc001274000?, 0x1)
    /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00164aea0 sp=0xc00164ad80 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000650820, 0x1)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc00164af30 sp=0xc00164aea0 pc=0xf6a273
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00062a4d0)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc00164af90 sp=0xc00164af30 pc=0xf6a986
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:451 +0x85 fp=0xc00164afe0 sp=0xc00164af90 pc=0xf73f25
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00164afe8 sp=0xc00164afe0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 192
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:449 +0x2433

goroutine 249 [select, 1 minutes]:
runtime.gopark(0xc000d36770?, 0x4?, 0x60?, 0xaf?, 0xc000d366c0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000d36528 sp=0xc000d36508 pc=0x43e26e
runtime.selectgo(0xc000d36770, 0xc000d366b8, 0x0?, 0x0, 0x0?, 0x1)
    /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000d36648 sp=0xc000d36528 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0012cf520)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:1152 +0x225 fp=0xc000d367c8 sp=0xc000d36648 pc=0xf88485
google.golang.org/grpc/internal/transport.NewServerTransport.func4()
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x25 fp=0xc000d367e0 sp=0xc000d367c8 pc=0xf810c5
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000d367e8 sp=0xc000d367e0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 247
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x1b0e

goroutine 250 [IO wait, 1 minutes]:
runtime.gopark(0x45d964b800?, 0xb?, 0x0?, 0x0?, 0x15?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000e4ea20 sp=0xc000e4ea00 pc=0x43e26e
runtime.netpollblock(0x4c5158?, 0x407de6?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000e4ea58 sp=0xc000e4ea20 pc=0x436cf7
internal/poll.runtime_pollWait(0x7fefd1f83fb0, 0x72)
    /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000e4ea78 sp=0xc000e4ea58 pc=0x46b905
internal/poll.(*pollDesc).wait(0xc00077e700?, 0xc001310000?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000e4eaa0 sp=0xc000e4ea78 pc=0x4e2ec7
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00077e700, {0xc001310000, 0x8000, 0x8000})
    /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000e4eb38 sp=0xc000e4eaa0 pc=0x4e41ba
net.(*netFD).Read(0xc00077e700, {0xc001310000?, 0x8000?, 0x8000?})
    /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000e4eb80 sp=0xc000e4eb38 pc=0x5ec9a5
net.(*conn).Read(0xc00007a2f0, {0xc001310000?, 0x0?, 0x0?})
    /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000e4ebc8 sp=0xc000e4eb80 pc=0x5fe585
net.(*TCPConn).Read(0x0?, {0xc001310000?, 0x10401?, 0x1040100000000?})
    <autogenerated>:1 +0x25 fp=0xc000e4ebf8 sp=0xc000e4ebc8 pc=0x60f8c5
github.com/soheilhy/cmux.(*bufferedReader).Read(0xc0005406a0, {0xc001310000, 0x0?, 0x8000})
    /go/pkg/mod/github.com/soheilhy/cmux@v0.1.5/buffer.go:53 +0x12f fp=0xc000e4ec48 sp=0xc000e4ebf8 pc=0x1f8812f
github.com/soheilhy/cmux.(*MuxConn).Read(0x1010401?, {0xc001310000?, 0x410665?, 0x1010401?})
    /go/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:297 +0x1e fp=0xc000e4ec78 sp=0xc000e4ec48 pc=0x1f8965e
bufio.(*Reader).Read(0xc00097a6c0, {0xc000bc0580, 0x9, 0x30?})
    /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc000e4ecb0 sp=0xc000e4ec78 pc=0x696c77
io.ReadAtLeast({0x3ce05c0, 0xc00097a6c0}, {0xc000bc0580, 0x9, 0x9}, 0x9)
    /usr/local/go/src/io/io.go:335 +0x90 fp=0xc000e4ecf8 sp=0xc000e4ecb0 pc=0x4b9cf0
io.ReadFull(...)
    /usr/local/go/src/io/io.go:354
golang.org/x/net/http2.readFrameHeader({0xc000bc0580, 0x9, 0xc001287ec0?}, {0x3ce05c0?, 0xc00097a6c0?})
    /go/pkg/mod/golang.org/x/net@v0.23.0/http2/frame.go:237 +0x65 fp=0xc000e4ed48 sp=0xc000e4ecf8 pc=0x779945
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000bc0540)
    /go/pkg/mod/golang.org/x/net@v0.23.0/http2/frame.go:498 +0x85 fp=0xc000e4edf0 sp=0xc000e4ed48 pc=0x77a085
google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0012cf520, 0x1?)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc000e4ef00 sp=0xc000e4edf0 pc=0xf84325
google.golang.org/grpc.(*Server).serveStreams(0xc000852000, {0x3d1cf40?, 0xc0012cf520})
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc000e4ef80 sp=0xc000e4ef00 pc=0xfd5702
google.golang.org/grpc.(*Server).handleRawConn.func1()
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc000e4efe0 sp=0xc000e4ef80 pc=0xfd4f65
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000e4efe8 sp=0xc000e4efe0 pc=0x4712e1
created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 247
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185

goroutine 239 [select, 1 minutes]:
runtime.gopark(0xc001587f00?, 0x2?, 0x9?, 0x0?, 0xc001587ed4?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc001683d80 sp=0xc001683d60 pc=0x43e26e
runtime.selectgo(0xc001683f00, 0xc001587ed0, 0xf8b416?, 0x0, 0xc0014a8000?, 0x1)
    /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc001683ea0 sp=0xc001683d80 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000650aa0, 0x1)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc001683f30 sp=0xc001683ea0 pc=0xf6a273
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00062a620)
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc001683f90 sp=0xc001683f30 pc=0xf6a986
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:451 +0x85 fp=0xc001683fe0 sp=0xc001683f90 pc=0xf73f25
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc001683fe8 sp=0xc001683fe0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 226
    /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:449 +0x2433
Stream closed EOF for argo/argo-workflows-server-9fcd56c9d-5qnkv (argo-server)

and

nil pointer error
E0603 13:25:27.980203       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 161 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28f9d80?, 0x54d4f70})
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0014880c0?})
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x28f9d80?, 0x54d4f70?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
modernc.org/sqlite/lib._sqlite3VdbeExec(0xc0004dcb40, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:73083 +0xca5
modernc.org/sqlite/lib._sqlite3Step(0xc0004dcb40?, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67274 +0x6c
modernc.org/sqlite/lib.Xsqlite3_step(0xc000708500?, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67339 +0xab
zombiezen.com/go/sqlite.(*Stmt).step(0xc0010517a0)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc0010517a0)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc0010517a0, 0x3, 0xc002269708)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0xc00102f000?, {0x2e62f90?, 0x2d49?}, 0x2f?)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:123 +0x45
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).upsertWorkflow(0xc0007d0ff0, 0xc000deed80)
    /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:237 +0x5bb
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).Update(0xc0007d0ff0, {0x2d47e20?, 0xc000deed80?})
    /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:176 +0x9e
k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc0001bbb20, {0x0?, 0x0?, 0x5519740?}, {0x3cf17d0?, 0xc000ed54c0}, 0xc002269df0, 0xc001051800, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:506 +0x92e
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0001bbb20, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:429 +0x656
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:221 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004cc000?, {0x3ce28a0, 0xc000830d70}, 0x1, 0xc000103380)
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xaf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0001bbb20, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:220 +0x1cd
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).Run(0x3d0fe08?, 0xc000778b90?)
    /go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:86 +0x1c
created by github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).Run in goroutine 1
    /go/src/github.com/argoproj/argo-workflows/server/apiserver/argoserver.go:269 +0x1409
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x1972f85]

goroutine 161 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0014880c0?})
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x28f9d80?, 0x54d4f70?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
modernc.org/sqlite/lib._sqlite3VdbeExec(0xc0004dcb40, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:73083 +0xca5
modernc.org/sqlite/lib._sqlite3Step(0xc0004dcb40?, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67274 +0x6c
modernc.org/sqlite/lib.Xsqlite3_step(0xc000708500?, 0x7fcfac904028)
    /go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67339 +0xab
zombiezen.com/go/sqlite.(*Stmt).step(0xc0010517a0)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc0010517a0)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc0010517a0, 0x3, 0xc002269708)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0xc00102f000?, {0x2e62f90?, 0x2d49?}, 0x2f?)
    /go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:123 +0x45
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).upsertWorkflow(0xc0007d0ff0, 0xc000deed80)
    /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:237 +0x5bb
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).Update(0xc0007d0ff0, {0x2d47e20?, 0xc000deed80?})
    /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:176 +0x9e
k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc0001bbb20, {0x0?, 0x0?, 0x5519740?}, {0x3cf17d0?, 0xc000ed54c0}, 0xc002269df0, 0xc001051800, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:506 +0x92e
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0001bbb20, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:429 +0x656
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:221 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004cc000?, {0x3ce28a0, 0xc000830d70}, 0x1, 0xc000103380)
    /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xaf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0001bbb20, 0xc000103380)
    /go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:220 +0x1cd
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).Run(0x3d0fe08?, 0xc000778b90?)
    /go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:86 +0x1c
created by github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).Run in goroutine 1
    /go/src/github.com/argoproj/argo-workflows/server/apiserver/argoserver.go:269 +0x1409
Stream closed EOF for argo/argo-workflows-server-9fcd56c9d-5qnkv (argo-server)

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 3, 2024

As mentioned in Slack, the nil pointer error is due to the SQLite change from #13021 / #12736. As is the index out of bounds error; they both go through SQLite in the stack trace.

cc @jiachengxu who wrote the PRs

Also as Alan asked on Slack, can you identify what action caused the Server to panic? Were you using the UI, the CLI, or the API and which page, command, or method made it panic?
I recognize you might have a lot of users and so perhaps that may be difficult to isolate, but if you can that would help debug.

@Joibel thank you, yes it's coming from controller, but somehow related to this, it's dropped after rollback 🤔

As I wrote on Slack, it would be due to a different PR as this one only impacts the Server. Also likely a different version, since you upgraded/rolled back across two patches and 3.5.6 is a more likely culprit for Controller errors than 3.5.7.
If you could isolate which version is causing the Controller errors that would help as well. EDIT: Split into #13149

@agilgur5 agilgur5 added the P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority label Jun 3, 2024
@agilgur5 agilgur5 changed the title Argo Server pods are crashing after upgrading to version 3.5.7 3.5.7 Server pods are crashing after upgrade Jun 3, 2024
@jiachengxu
Copy link
Member

It looks like the nil pointer error is from https://github.com/argoproj/argo-workflows/blob/main/server/workflow/store/sqlite_store.go#L105, and the s.conn is nil.
so it might be the issue that the database connection is not initialized before it starts serving list requests.
Maybe a readiness probe to check if db connection is initialized before accepting traffic can help.

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 4, 2024

Hmm but the SQLiteStore is created before even the HTTP server is, so if it responded to any traffic I would think the connection already exists (and this isn't a remote DB so the connection can't drop and need reconnecting) 🤔

@Joibel
Copy link
Member

Joibel commented Jun 4, 2024

Indeed. The discussion in slack has gone on from here, but not with really any enlightenment as to what is going wrong.

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 4, 2024

Yea I just saw and read through the Slack thread (first message since I'm awake / in my TZ). I see you said roughly the same thing as me there. 👍

Also likely a different version, since you upgraded/rolled back across two patches and 3.5.6 is a more likely culprit for Controller errors than 3.5.7.
If you could isolate which version is causing the Controller errors that would help as well.

This was just confirmed on Slack that the Controller error rate is from 3.5.6. EDIT: split into #13149

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 4, 2024

From OP:

github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).ListWorkflows(0xc000700a38, {0xc000589360?, 0x0?}, {0xc001258e10?, 0x0?}, {0x0?, 0x0?}, {{{0x0, 0x0}, {0x0, ...}}, ...})
  /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:105 +0x2ab

i.e.

err = sqlitex.Execute(s.conn, query, &sqlitex.ExecOptions{

From @reisei's comment above:

github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).upsertWorkflow(0xc0007d0ff0, 0xc000deed80)
   /go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:237 +0x5bb

i.e.

err = sqlitex.Execute(s.conn, insertWorkflowQuery,

Notably these are different API calls.

Also, I'm not a Go expert, but I'm thinking the line number is just the start, so it would include the rest of the statement after, which would include some variables other than s.conn, i.e. including ExecOptions

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Jun 5, 2024
@Joibel Joibel removed the area/controller Controller issues, panics label Jun 6, 2024
Joibel added a commit to Joibel/argo-workflows that referenced this issue Jun 11, 2024
[zombiezen/go-sqlite]
(https://github.com/zombiezen/go-sqlite/blob/main/doc.go#L32) is not
thread safe when used through a single connection. The current code is
provably racing (run the server with `-race` and a few workflows being
run) and it will tell you this if you `argo list` via the server a few
times.

This change doesn't attempt to move to a multiple connection model,
it's a minimal change to stop the server crashing all the time, by
mutexing the use of the sql connection.

Fixes argoproj#13154 and argoproj#13140

Signed-off-by: Alan Clucas <[email protected]>
@scany1211
Copy link

scany1211 commented Jun 24, 2024

I had the same issue with 3.5.7, is this fix available in release version now ?

@agilgur5
Copy link
Contributor

agilgur5 commented Jun 24, 2024

In 3.5.8 yes, see the changelog: #13206

@agilgur5 agilgur5 added the area/api Argo Server API label Sep 21, 2024
@argoproj argoproj locked as resolved and limited conversation to collaborators Sep 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/api Argo Server API area/server P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants