v3.5.0: Server OOM killed when fetching archived workflows #12872
Labels
area/api
Argo Server API
P1
High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority
solution/duplicate
This issue or PR is a duplicate of an existing one
solution/outdated
This is not up-to-date with the current version
type/bug
type/regression
Regression from previous behavior (a specific type of bug)
Pre-requisites
:latest
What happened/what did you expect to happen?
When the Workflow Archive feature is enabled, Argo Workflows starts querying the db while trying to fetch all of the workflows.
In case there is high number of workflows persisted (experienced this with >100000 workflows in
argo_archived_workflows
) on the DB the applications seems to struggle and it can lead to a considerable increase of of RAM needed to run.Sometimes this is preceeded by the attached log, which indicates a pretty clear issue: there is no limit set on the query.
While the workflowArchiveServer interfaces seems to have some logic to handle limits, such logic seems to have been disabled by the following snippet at
server/workflow/workflow_server.go#228
Here the limit is hardcoded to 0, which leads to no limit to be set at all.
Requests
Version
3.5.0
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Not relevant
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: