-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(server): mutex calls to sqlitex #13166
Conversation
[zombiezen/go-sqlite] (https://github.com/zombiezen/go-sqlite/blob/main/doc.go#L32) is not thread safe when used through a single connection. The current code is provably racing (run the server with `-race` and a few workflows being run) and it will tell you this if you `argo list` via the server a few times. This change doesn't attempt to move to a multiple connection model, it's a minimal change to stop the server crashing all the time, by mutexing the use of the sql connection. Fixes argoproj#13154 and argoproj#13140 Signed-off-by: Alan Clucas <[email protected]>
cc @jiachengxu for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me, thanks for the work @Joibel!
@agilgur5, @tczhao, @isubasinghe, @terrytangyuan - could someone take a look at this one so we can prep a 3.5.8 with it in? |
good catch @Joibel |
Nice work figuring out the root cause @Joibel! Will you or @jiachengxu be taking up the full fix with thread pooling?
Would like to get a fix for #13177 in as well and then release it ASAP |
(cherry picked from commit 0ca0c0f)
Wrote up a follow-up issue to track this in #13638 |
Fixes #13154
Fixes #13140
Motivation
zombiezen/go-sqlite is not thread safe when used through a single connection. The current code is provably racing (run the server with
-race
and a few workflows being run) and it will tell you this if youargo list
via the server a few times. Unless it crashes first.Modifications
This change doesn't attempt to move to a multiple connection model, it's a minimal change to stop the server crashing all the time, by mutexing the use of the sql connection.
Verification
Running v3.5.7
Inject 200
example/dag-diamond.yaml
s into your cluster. Runargo list -s <server>
to list the workflows. It will crash within a few attempts, usuallyWith this change it didn't crash in 200 attempts at
argo list
.-race
build of the server doesn't complain any more either.