Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/api/v1/workflows/ has slow response time when, there are many archived workflows in 3.6.0 #13948

Open
3 of 4 tasks
DingGGu opened this issue Nov 27, 2024 · 1 comment
Open
3 of 4 tasks

Comments

@DingGGu
Copy link

DingGGu commented Nov 27, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

After upgrade 3.6.0 (and latest) from 3.5.8,
http://argo-server/workflows/?&limit=50 takes response time too long.

I've found there is log for argo-server

Query:          SELECT count(*) as total FROM "argo_archived_workflows" WHERE (("clustername" = $1 AND "instanceid" = $2) AND not exists (select 1 from argo_archived_workflows_labels where clustername = argo_archived_workflows.clustername and uid = argo_archived_workflows.uid and name = 'workflows.argoproj.io/controller-instanceid'))
Arguments:      []interface {}{"default", ""}
Error:          upper: slow query
Time taken:     363.58917s
<...>

time="2024-11-27T10:52:09.899Z" level=info duration=6m3.617456665s method=GET path=/api/v1/workflows/ size=33993 status=0

There seems to be an issue with the count query fetching archived_workflow after the upgrade.

You can see memory of database has significant increase.
image

There are 2 million archived workflows in the database:

# SELECT count(*) as total FROM "argo_archived_workflows" WHERE (("clustername" = 'default' AND "instanceid" = '') AND not exists (select 1 from argo_archived_workflows_labels where clustername = argo_archived_workflows.clustername and uid = argo_archived_workflows.uid and name = 'workflows.argoproj.io/controller-instanceid'))
postgres-# ;
  total
---------
 2040568
(1 row)

In helm values.yaml, we need to set workflows ttl 400days.

controller:
  persistence:
    archive: true
    archiveTTL: 400d

Version(s)

a1f67794be68c2bcdfb8900e20ce18b0ab9115ebfe0adf1f1f5100eb6bd7604c

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

.

Logs from the workflow controller

.

Logs from in your workflow's wait container

.
@DingGGu
Copy link
Author

DingGGu commented Nov 27, 2024

Maybe related #13601, but query is not exactly same that issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants