Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.5: Further optimize Archive List API call / DB query #13601

Open
4 tasks done
agilgur5 opened this issue Sep 14, 2024 · 24 comments · May be fixed by #13819
Open
4 tasks done

3.5: Further optimize Archive List API call / DB query #13601

agilgur5 opened this issue Sep 14, 2024 · 24 comments · May be fixed by #13819
Labels
area/api Argo Server API area/workflow-archive P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@agilgur5
Copy link
Contributor

agilgur5 commented Sep 14, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

This is a follow-up to #13295 and #13566. We can further improve the query performance, seemingly by another order of magnitude, by adding a union index:

CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat);

This can be added as a migration after this line in migration.go.

We should also test the performance of this index with and without the subquery. If the subquery is no longer necessary, then we can revert #13566 along with this change

More context

Quoting myself in #13566 (review):

Note that this query seems to be further optimizable per #13295 (comment), so we may want to open a new issue to add further optimizations, but this at least fixes the performance issues and gets us to a much more reasonable baseline

Quoting @Danny5487401 from #13563 (comment)

@agilgur5 three methods i tested:

  1. nice implemention by sub,according to fix(api): optimise archived list query. Fixes #13295 #13566

[...]

it drops from 37s 768ms to 1 s 981 ms

  1. force index (argo_archived_workflows_i4)
SELECT
    [...]
FROM `argo_archived_workflows` force index (argo_archived_workflows_i4)
WHERE [...]
ORDER BY `startedat` DESC
LIMIT 20;

it drops from 37s 768ms to 138 ms

  1. union index (clustername, startedat) to use Backward index scan instead of filesort
CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat);

it drops from 37s to 140 ms

And then my response:

It sounds like the force index and new index are both faster than the subquery, at least on MySQL 8.0, I wonder if we should use those instead of the subquery. force index is potentially not the most resilient, but the new index seems like a "why not?" to me

We should also add a simplified/minimal "performance regression test" to ensure that, well, the performance doesn't regress, per #13566 (comment):

Version(s)

v3.5, v3.5.10, 729ac17

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a - this is API focused

Logs from the workflow controller

n/a - this is API focused

Logs from in your workflow's wait container

n/a - this is API focused
@agilgur5 agilgur5 added type/bug type/regression Regression from previous behavior (a specific type of bug) area/api Argo Server API area/workflow-archive P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important labels Sep 14, 2024
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Sep 14, 2024
@agilgur5 agilgur5 added the solution/suggested A solution to the bug has been suggested. Someone needs to implement it. label Sep 14, 2024
@kodieg
Copy link

kodieg commented Sep 26, 2024

I tested fetching list of workflows using a new version v3.5.11, but still observed quite large times (~20s). We're using PostgreSQL database.

I tested converting json to jsonb and observed times approximately 9s.

Finally, tested generated columns variant proposed here and it worked superfast.

Is it possible to pursue optimization with generated columns for postgres?

@agilgur5
Copy link
Contributor Author

agilgur5 commented Sep 26, 2024

You should try the union index as written above from #13563 (comment).

Your attempts suggest that your DB engine is for some reason still parsing all JSON instead of the single row. What version of Postgres are you on?

The generated columns can cause a very long migration time per #13295 (comment) and require newer Postgres, so are not suitable for a patch upgrade. They can also slow down certain queries as well as they increase the size of the table and rows. Lastly, they are unsustainable in the long run per #13295 (comment) as we'd have to keep adding more columns as users want to filter or query more.

Generated columns are also a band-aid and don't solve the root cause of the performance, which is that the DB is extracting JSON in every row, despite it not being used in the WHERE filter. The subquery and different indices actually attempt to fix the root cause

@kodieg
Copy link

kodieg commented Sep 26, 2024

I already tested this with union index added.

We're using PostgreSQL 12.4.

Maybe my usecase is a bit different. We're fetching information about workflows using /api/v1/workflows/argo-workflows
So we want to fetch multiple rows (not a single row). This can take over long seconds even if we want to fetch 1000 of workflows. In this case with limiting and almost no filtering, parsing jsons is slow. Even query like fetch all data is slow. Even migration to jsonb helps a lot.

Maybe a way to improve current situation would be to migrate workflow column to jsonb type or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query (because this workflow jsons are huge).

@agilgur5
Copy link
Contributor Author

agilgur5 commented Sep 26, 2024

I already tested this with union index added.

Can you clarify the performance of that? You didn't mention it in your first comment.

This can take over long seconds even if we want to fetch 1000 of workflows.

Oh, well if you're intentionally not paginating, that's rather unsurprising that it's slow. Then the indexing is not the main issue but the actual retrieval.
Unpaginated queries weren't the original purpose of this issue, though I suppose that too is a 3.5 regression due to the JSON retrieval

@agilgur5
Copy link
Contributor Author

Even migration to jsonb helps a lot.

JSONB is only supported by Postgres, but we could migrate Postgres to that if the migration time is minimal. You'd have to test that and then PR for the migration and query differences, for Postgres only

or add a new jsonb column with its contents limited to what we want to be available for filtering and fetching in this query

This will have similar sustainability issues as the fields stored will change over time, but this causes unexpected problems in the future as well as adding fields will only add to new rows and not old rows. So that approach seems quite suboptimal.

@kodieg
Copy link

kodieg commented Sep 27, 2024

I tested times on small postgres instance. However, I think it will be faster than migration from text to json that already may be applied.

For 100k of workflows it took 96 seconds (on pretty small and slow instance).

Currently, we're cleaning up this table every few hours leaving us without history.

@agilgur5
Copy link
Contributor Author

Currently, we're cleaning up this table every few hours leaving us without history.

You should just use API requests with smaller limits and keep your history.

@MasonM
Copy link
Contributor

MasonM commented Oct 2, 2024

I mentioned in the contributor meeting I'd look into this. To evaluate these kind of optimizations, you need a ton of rows in argo_archived_workflows, but there doesn't seem to be a way of generating that with the dev environment. Have you all been using real-world DB dumps? I don't have access to a dump with that kind of data, unfortunately.

I think there should be a script (probably under hack/) to generate dummy data and insert that into the DB. The schema for argo_archived_workflows is pretty straightforward, so that should be easy without introducing any new tools/libraries. I can work on that, unless anybody has any better ideas.

@Joibel
Copy link
Member

Joibel commented Oct 2, 2024

I don't have better ideas - I agree that would be useful though.

@agilgur5
Copy link
Contributor Author

agilgur5 commented Oct 2, 2024

To evaluate these kind of optimizations, you need a ton of rows in argo_archived_workflows, but there doesn't seem to be a way of generating that with the dev environment

Indeed that is a challenge of testing this and one reason why higher scale performance regressions are hard to catch early -- both dev and test would need to operate with such large numbers.

Have you all been using real-world DB dumps?

No, we've actually been relying on user feedback to test much of the time.

I think there should be a script (probably under hack/) to generate dummy data and insert that into the DB.

Yea you can probably get this working with faker or similar tooling

so that should be easy without introducing any new tools/libraries

The JSON blob is the primary column needing an optimization, and its schema is the entire CR, so that part would be difficult without tooling 😅

MasonM added a commit to MasonM/argo-workflows that referenced this issue Oct 6, 2024
This adds a small CLI tool for developers to use when working with the
DB locally. It provides two functions:
1. Explicitly migrate the DB (which is normally only done when the
   workflow controller starts) to help test migrations:

     ```
     $ go run ./hack/db migrate
     INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
     ```
2. Insert randomly-generated archived workflows into the DB, which is
   intended to help test query performance (e.g.
   argoproj#13601)

    ```
    $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000
    Using seed 2600415997737113470
    Clusters: [cx5j6rrzm5vnkfqqr4d]
    Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
    Inserted 100000 rows

    real    17m56.554s
    user    1m53.833s
    sys     0m43.581s
    ```

  This is obviously not as efficient as it could be, but it's only
  intended to be run on an adhoc basis when manually testing query
  performance. To make it fast, we'd probably have to switch to direct
  SQL inserts, which would couple this script to the logic in
  `persist/sqldb/workflow_archive.go`.

Signed-off-by: Mason Malone <[email protected]>

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Author:    Mason Malone <[email protected]>
# Date:      Thu Oct 3 18:48:48 2024 -0700
#
# On branch feat-cli-db-generator2
# Changes to be committed:
#	modified:   docs/running-locally.md
#	new file:   hack/db/main.go
#
# Untracked files:
#	notes.md
#	test.json
#	tmp
#
MasonM added a commit to MasonM/argo-workflows that referenced this issue Oct 6, 2024
This adds a small CLI tool for developers to use when working with the
DB locally. It provides two functions:
1. Explicitly migrate the DB (which is normally only done when the
   workflow controller starts) to help test migrations:

     ```
     $ go run ./hack/db migrate
     INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
     ```
2. Insert randomly-generated archived workflows into the DB, which is
   intended to help test query performance (e.g.
   argoproj#13601)

    ```
    $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000
    Using seed 2600415997737113470
    Clusters: [cx5j6rrzm5vnkfqqr4d]
    Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
    Inserted 100000 rows

    real    17m56.554s
    user    1m53.833s
    sys     0m43.581s
    ```

  This is obviously not as efficient as it could be, but it's only
  intended to be run on an adhoc basis when manually testing query
  performance. To make it fast, we'd probably have to switch to direct
  SQL inserts, which would couple this script to the logic in
  `persist/sqldb/workflow_archive.go`.

Signed-off-by: Mason Malone <[email protected]>
@MasonM
Copy link
Contributor

MasonM commented Oct 6, 2024

Okay, I entered a draft PR with a simple CLI tool to populate the DB with randomly-generated workflows: #13715

Unfortunately, I can't reproduce the performance issues reported in #13566. @ryancurrah said he had 93,135 workflows in #13566 (comment), so I used the tool to populate the DB with 100,000 randomly-generated workflows.

$  go run ./hack/db/main.go fake-archived-workflows --rows 100000 --seed 2600415997737113470
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
Inserted 100000 rows

Every workflow is 1466 bytes ~5900 bytes (edited) bytes after storage and associated with the same cluster ("cx5j6rrzm5vnkfqqr4d"). The tool generated 5 namespaces that are randomly assigned to each workflow (i.e. each namespace has ~20,000 workflows associated with it). Then, I tried executing a simple variation of the "before" query from that issue:

EXPLAIN ANALYZE SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'cx5j6rrzm5vnkfqqr4d'
        AND "instanceid" = '')
       AND "namespace" = 'bs98pxsfhs9z'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded')
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1;

but it consistently executes in <2ms:

Limit  (cost=1.13..60.41 rows=1 width=318) (actual time=1.067..1.068 rows=1 loops=1)
  ->  Nested Loop  (cost=1.13..100778.36 rows=1700 width=318) (actual time=1.065..1.066 rows=1 loops=1)
        Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
        ->  Nested Loop  (cost=0.71..89644.36 rows=3364 width=1617) (actual time=0.071..0.071 rows=1 loops=1)
              ->  Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows  (cost=0.29..24325.29 rows=19917 width=1580) (actual time=0.026..0.027 rows=1 loops=1)
                    Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
                    Rows Removed by Filter: 2
              ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..3.28 rows=1 width=57) (actual time=0.040..0.040 rows=1 loops=1)
                    Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                    Filter: ((value)::text = 'Succeeded'::text)
        ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=0.42..3.28 rows=1 width=57) (actual time=0.016..0.016 rows=1 loops=1)
              Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
              Filter: ((value)::text = 'Persisted'::text)
Planning Time: 1.916 ms
Execution Time: 1.226 ms

The query plan shows it's doing an index scan on the 19,917 rows associated with the namespace bs98pxsfhs9z, which is what it should be doing. I'm not sure why @ryancurrah got such different results. Maybe the average size of the workflows in his DB is very large? If so, that would indicate TOAST issues. You can use this query to check:

select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;

Results for me:

          avg          |          avg          
-----------------------+-----------------------
 5907.8400000000000000 | 1488.8100000000000000

@agilgur5
Copy link
Contributor Author

agilgur5 commented Oct 7, 2024

Every workflow is ~1466 bytes

Maybe the average size of the workflows in his DB is larger than 2KB?

Yea I wouldn't be surprised by that, after including the status, workflows can be very large. See also node status offloading and #7121

@MasonM
Copy link
Contributor

MasonM commented Oct 7, 2024

I edited my previous comment because I think I was wrong about the workflow size being ~1466 bytes. That was the value returned by select avg(pg_column_size(workflow)) from argo_archived_workflows, which I assumed including TOAST'd values, but it appears it doesn't (relevant StackOverflow). The actual size is ~5900 bytes per select avg(octet_length(workflow::text)) from argo_archived_workflows

@kodieg
Copy link

kodieg commented Oct 7, 2024

These 100'000 workflows were from a single namespace. I calculated average size of workflow column and max as well:

argo=# select avg(octet_length(workflow::text)) from argo_archived_workflows;
        avg
--------------------
 18599.709432882786

argo=# select max(octet_length(workflow::text)) from argo_archived_workflows;
  max
-------
 65875

Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time

```sql argo=# \timing

SELECT name,
namespace,
UID,
phase,
startedat,
finishedat,
coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
(workflow::JSON)->'spec'->>'suspend' AS suspend,
coalesce((workflow::JSON)->'status'->>'message', '') AS message,
coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
AND "instanceid" = '')
AND "namespace" = 'argo-workflows'
AND EXISTS
(SELECT 1
FROM argo_archived_workflows_labels
WHERE clustername = argo_archived_workflows.clustername
AND UID = argo_archived_workflows.uid
AND name = 'workflows.argoproj.io/phase'
AND value = 'Succeeded') AND EXISTS (SELECT 1
FROM argo_archived_workflows_labels
WHERE clustername = argo_archived_workflows.clustername
AND UID = argo_archived_workflows.uid
AND name = 'workflows.argoproj.io/workflow-archiving-status'
AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1000;

Time: 61666.463 ms (01:01.666)

</details>

With JSONB it takes less than 5 seconds

<details>

```sql
SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSONB)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSONB)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSONB)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSONB)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSONB)->'status'->>'message', '') AS message,
       coalesce((workflow::JSONB)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSONB)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
        AND "instanceid" = '')
       AND "namespace" = 'argo-workflows'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded') AND EXISTS (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1000;

Even though, fetching single row takes 60ms

SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE (("clustername" = 'default'
        AND "instanceid" = '')
       AND "namespace" = 'argo-workflows'
       AND EXISTS
         (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/phase'
            AND value = 'Succeeded') AND EXISTS (SELECT 1
          FROM argo_archived_workflows_labels
          WHERE clustername = argo_archived_workflows.clustername
            AND UID = argo_archived_workflows.uid
            AND name = 'workflows.argoproj.io/workflow-archiving-status'
            AND value = 'Persisted'))
ORDER BY "startedat" DESC
LIMIT 1;

Time: 61.373 ms

Below execution plan for fetching a single row:

                                                                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.12..693.65 rows=1 width=343) (actual time=1.253..1.254 rows=1 loops=1)
   ->  Nested Loop  (cost=1.12..65098.89 rows=94 width=343) (actual time=1.252..1.253 rows=1 loops=1)
         Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows_labels_1.uid)::text)
         ->  Nested Loop  (cost=0.71..60559.85 rows=1096 width=182) (actual time=0.039..0.039 rows=1 loops=1)
               ->  Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows  (cost=0.29..2133.01 rows=14172 width=145) (actual time=0.018..0.018 rows=1 loops=1)
                     Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
               ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..4.12 rows=1 width=45) (actual time=0.018..0.018 rows=1 loops=1)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                     Filter: ((value)::text = 'Succeeded'::text)
         ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=0.42..4.12 rows=1 width=45) (actual time=0.008..0.008 rows=1 loops=1)
               Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows.uid)::text) AND ((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text))
               Filter: ((value)::text = 'Persisted'::text)
 Planning Time: 0.981 ms
 Execution Time: 1.316 ms
(14 rows)

Time: 58.798 ms

And for fetching 1000 rows

 Limit  (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.688..58700.913 rows=1000 loops=1)
   ->  Sort  (cost=5133.53..5133.77 rows=94 width=343) (actual time=58700.687..58700.843 rows=1000 loops=1)
         Sort Key: argo_archived_workflows.startedat DESC
         Sort Method: top-N heapsort  Memory: 2080kB
         ->  Nested Loop  (cost=2490.79..5130.45 rows=94 width=343) (actual time=29.798..58665.875 rows=13214 loops=1)
               Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
               ->  Hash Join  (cost=2490.38..4741.70 rows=94 width=90) (actual time=18.732..67.268 rows=13214 loops=1)
                     Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                     ->  Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.345..27.844 rows=13983 loops=1)
                           Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                           Filter: ((clustername)::text = 'default'::text)
                           Heap Blocks: exact=2321
                           ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..180.57 rows=1215 width=0) (actual time=2.986..2.986 rows=14183 loops=1)
                                 Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                     ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.378..15.379 rows=13214 loops=1)
                           Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1122kB
                           ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.275..11.281 rows=13214 loops=1)
                                 Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                 Filter: ((clustername)::text = 'default'::text)
                                 Heap Blocks: exact=2283
                                 ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.924..2.924 rows=13414 loops=1)
                                       Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
               ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.41..4.05 rows=1 width=145) (actual time=0.011..0.011 rows=1 loops=13214)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
                     Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
 Planning Time: 0.959 ms
 Execution Time: 58701.536 ms
(27 rows)

Time: 58733.573 ms (00:58.734)

I starded to dig a little bit more and found a single row that takes that long to fetch if I do it with some offset.

                                                                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5133.34..5133.34 rows=1 width=365) (actual time=60255.876..60255.880 rows=1 loops=1)
   ->  Sort  (cost=5133.31..5133.55 rows=94 width=365) (actual time=60255.872..60255.877 rows=12 loops=1)
         Sort Key: argo_archived_workflows.startedat DESC
         Sort Method: top-N heapsort  Memory: 50kB
         ->  Nested Loop  (cost=2490.79..5131.16 rows=94 width=365) (actual time=29.639..60230.904 rows=13029 loops=1)
               Join Filter: ((argo_archived_workflows_labels.uid)::text = (argo_archived_workflows.uid)::text)
               ->  Hash Join  (cost=2490.38..4741.70 rows=94 width=90) (actual time=19.018..63.536 rows=13029 loops=1)
                     Hash Cond: ((argo_archived_workflows_labels_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                     ->  Bitmap Heap Scan on argo_archived_workflows_labels argo_archived_workflows_labels_1  (cost=180.87..2429.01 rows=1215 width=45) (actual time=3.701..26.502 rows=13798 loops=1)
                           Recheck Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                           Filter: ((clustername)::text = 'default'::text)
                           Heap Blocks: exact=2291
                           ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..180.57 rows=1215 width=0) (actual time=3.346..3.347 rows=14198 loops=1)
                                 Index Cond: (((name)::text = 'workflows.argoproj.io/workflow-archiving-status'::text) AND ((value)::text = 'Persisted'::text))
                     ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.308..15.309 rows=13029 loops=1)
                           Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1108kB
                           ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.407..11.517 rows=13029 loops=1)
                                 Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                 Filter: ((clustername)::text = 'default'::text)
                                 Heap Blocks: exact=2253
                                 ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.962..2.963 rows=13429 loops=1)
                                       Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
               ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.41..4.05 rows=1 width=145) (actual time=0.010..0.010 rows=1 loops=13029)
                     Index Cond: (((clustername)::text = 'default'::text) AND ((uid)::text = (argo_archived_workflows_labels_1.uid)::text))
                     Filter: (((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
 Planning Time: 0.977 ms
 Execution Time: 60255.951 ms
(27 rows)

Time: 60343.720 ms (01:00.344)

@kodieg
Copy link

kodieg commented Oct 7, 2024

I noticed that you used all query (and I did it as well with EXISTS) that supposedly parses all JSON. I tested with new query

EXPLAIN ANALYZE SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow::JSON)->'status'->>'progress', '') AS progress,
       coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow::JSON)->'spec'->>'suspend' AS suspend,
       coalesce((workflow::JSON)->'status'->>'message', '') AS message,
       coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM "argo_archived_workflows"
WHERE "clustername" = 'default'
  AND UID IN
    (SELECT UID
     FROM "argo_archived_workflows"
     WHERE (("clustername" = 'default'
             AND "instanceid" = '')
            AND "namespace" = 'argo-workflows'
            AND EXISTS
              (SELECT 1
               FROM argo_archived_workflows_labels
               WHERE clustername = argo_archived_workflows.clustername
                 AND UID = argo_archived_workflows.uid
                 AND name = 'workflows.argoproj.io/phase'
                 AND value = 'Succeeded')
           )
     ORDER BY "startedat" DESC
     LIMIT 1000);

JSON version takes 5 seconds

                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=4276.07..6182.55 rows=1000 width=375) (actual time=47.982..5625.164 rows=1000 loops=1)
   Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
   ->  Seq Scan on argo_archived_workflows  (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..7.680 rows=13812 loops=1)
         Filter: ((clustername)::text = 'default'::text)
   ->  Hash  (cost=4263.57..4263.57 rows=1000 width=37) (actual time=36.108..36.112 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 76kB
         ->  Subquery Scan on "ANY_subquery"  (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.608..35.879 rows=1000 loops=1)
               ->  Limit  (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.607..35.738 rows=1000 loops=1)
                     ->  Sort  (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.606..35.666 rows=1000 loops=1)
                           Sort Key: argo_archived_workflows_1.startedat DESC
                           Sort Method: top-N heapsort  Memory: 205kB
                           ->  Hash Join  (cost=2309.51..4195.73 rows=1096 width=45) (actual time=15.081..28.178 rows=13043 loops=1)
                                 Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                                 ->  Seq Scan on argo_archived_workflows argo_archived_workflows_1  (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.618 rows=13812 loops=1)
                                       Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
                                 ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=15.071..15.072 rows=13043 loops=1)
                                       Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1109kB
                                       ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.226..11.272 rows=13043 loops=1)
                                             Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                             Filter: ((clustername)::text = 'default'::text)
                                             Heap Blocks: exact=2255
                                             ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.883..2.883 rows=13443 loops=1)
                                                   Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
 Planning Time: 0.641 ms
 Execution Time: 5625.530 ms
(25 rows)

Time: 5714.274 ms (00:05.714)

while JSONB take 500ms

                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=4276.07..6142.55 rows=1000 width=375) (actual time=36.683..494.291 rows=1000 loops=1)
   Hash Cond: ((argo_archived_workflows.uid)::text = ("ANY_subquery".uid)::text)
   ->  Seq Scan on argo_archived_workflows  (cost=0.00..1778.15 rows=14172 width=137) (actual time=0.008..5.076 rows=13812 loops=1)
         Filter: ((clustername)::text = 'default'::text)
   ->  Hash  (cost=4263.57..4263.57 rows=1000 width=37) (actual time=35.711..35.716 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 76kB
         ->  Subquery Scan on "ANY_subquery"  (cost=4251.07..4263.57 rows=1000 width=37) (actual time=35.226..35.483 rows=1000 loops=1)
               ->  Limit  (cost=4251.07..4253.57 rows=1000 width=45) (actual time=35.225..35.355 rows=1000 loops=1)
                     ->  Sort  (cost=4251.07..4253.81 rows=1096 width=45) (actual time=35.224..35.284 rows=1000 loops=1)
                           Sort Key: argo_archived_workflows_1.startedat DESC
                           Sort Method: top-N heapsort  Memory: 205kB
                           ->  Hash Join  (cost=2309.51..4195.73 rows=1096 width=45) (actual time=14.685..27.776 rows=13043 loops=1)
                                 Hash Cond: ((argo_archived_workflows_1.uid)::text = (argo_archived_workflows_labels.uid)::text)
                                 ->  Seq Scan on argo_archived_workflows argo_archived_workflows_1  (cost=0.00..1849.01 rows=14172 width=53) (actual time=0.003..7.707 rows=13812 loops=1)
                                       Filter: (((clustername)::text = 'default'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'argo-workflows'::text))
                                 ->  Hash  (cost=2295.81..2295.81 rows=1096 width=45) (actual time=14.676..14.678 rows=13043 loops=1)
                                       Buckets: 16384 (originally 2048)  Batches: 1 (originally 1)  Memory Usage: 1109kB
                                       ->  Bitmap Heap Scan on argo_archived_workflows_labels  (cost=163.65..2295.81 rows=1096 width=45) (actual time=3.230..11.149 rows=13043 loops=1)
                                             Recheck Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
                                             Filter: ((clustername)::text = 'default'::text)
                                             Heap Blocks: exact=2255
                                             ->  Bitmap Index Scan on argo_archived_workflows_labels_i1  (cost=0.00..163.38 rows=1096 width=0) (actual time=2.884..2.885 rows=13443 loops=1)
                                                   Index Cond: (((name)::text = 'workflows.argoproj.io/phase'::text) AND ((value)::text = 'Succeeded'::text))
 Planning Time: 0.596 ms
 Execution Time: 494.462 ms
(25 rows)

Time: 535.514 ms

I tested fetching 10, 100 and 1000 rows and times are increasing almost linearly, but JSONB version is consistently 10x faster.

@MasonM
Copy link
Contributor

MasonM commented Oct 8, 2024

@kodieg Thanks for the details!

Also explain indicates that you are retrieving a single row. It works instantly as well for me, but retrieving 1000 of rows takes much more time

The queries in the PR description of #13566 all had LIMIT 1, and the screenshot shows the query taking 11495ms before the change. If it's nearly instant for you, that means you can't reproduce it either. Of course, there's still other performance issues here, and fixing those might end up fixing the original issue, but we can't be sure if we can't reproduce it.

In any case, I tried to replicate your setup by creating a large workflow to use as a template: https://gist.github.com/MasonM/15cfb3c4f497324ac67d1a1a334d65db

Then I ran the following command to insert 100,000 randomized workflows using that template, all of which are associated with a single namespace (bs98pxsfhs9z):

$ time go run ./hack/db/main.go fake-archived-workflows --rows 100000 --namespaces 1 --seed 2600415997737113470 --template '@large-workflow.yaml'
Using seed 2600415997737113470
Clusters: [cx5j6rrzm5vnkfqqr4d]
Namespaces: [bs98pxsfhs9z]
Inserted 100000 rows

real    19m7.916s
user    2m47.567s
sys     0m48.239s

The average workflow size is now ~19KB:

postgres=# select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;
        avg         |          avg          
--------------------+-----------------------
 18900.846520000000 | 1707.9891100000000000
(1 row)

After that, I ran the query from my original comment (the "before" query), which completes in <4ms, so it still can't reproduce the original issue. If I change LIMIT 1 to LIMIT 1000, the execution time jumps to ~450ms, which I'd consider to still be pretty decent, since it's rare someone would need to list 1000 workflows at once. However, the "new" query from #13566 is much slower for 1,000 rows, taking ~1250ms:

postgres=# EXPLAIN ANALYZE SELECT name,
postgres-#        namespace,
postgres-#        UID,
postgres-#        phase,
postgres-#        startedat,
postgres-#        finishedat,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'labels', '{}') AS labels,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'annotations', '{}') AS annotations,
postgres-#        coalesce((workflow::JSON)->'status'->>'progress', '') AS progress,
postgres-#        coalesce((workflow::JSON)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
postgres-#        (workflow::JSON)->'spec'->>'suspend' AS suspend,
postgres-#        coalesce((workflow::JSON)->'status'->>'message', '') AS message,
postgres-#        coalesce((workflow::JSON)->'status'->>'estimatedDuration', '0') AS estimatedduration,
postgres-#        coalesce((workflow::JSON)->'status'->>'resourcesDuration', '{}') AS resourcesduration
postgres-# FROM "argo_archived_workflows"
postgres-# WHERE "clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres-#   AND UID IN
postgres-#     (SELECT UID
postgres(#      FROM "argo_archived_workflows"
postgres(#      WHERE (("clustername" = 'cx5j6rrzm5vnkfqqr4d'
postgres(#              AND "instanceid" = '')
postgres(#             AND "namespace" = 'bs98pxsfhs9z'
postgres(#             AND EXISTS
postgres(#               (SELECT 1
postgres(#                FROM argo_archived_workflows_labels
postgres(#                WHERE clustername = argo_archived_workflows.clustername
postgres(#                  AND UID = argo_archived_workflows.uid
postgres(#                  AND name = 'workflows.argoproj.io/phase'
postgres(#                  AND value = 'Succeeded')
postgres(#            )
postgres(#      ORDER BY "startedat" DESC
postgres(#      LIMIT 1000);
                                                                                                                   QUERY PLAN                                                                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=9212.87..16488.55 rows=1000 width=349) (actual time=33.046..1213.150 rows=1000 loops=1)
   ->  HashAggregate  (cost=9212.45..9222.45 rows=1000 width=37) (actual time=31.731..32.037 rows=1000 loops=1)
         Group Key: ("ANY_subquery".uid)::text
         ->  Subquery Scan on "ANY_subquery"  (cost=1000.74..9209.95 rows=1000 width=37) (actual time=11.299..31.039 rows=1000 loops=1)
               ->  Limit  (cost=1000.74..9199.95 rows=1000 width=45) (actual time=11.293..30.882 rows=1000 loops=1)
                     ->  Gather Merge  (cost=1000.74..136353.38 rows=16508 width=45) (actual time=11.290..30.783 rows=1000 loops=1)
                           Workers Planned: 2
                           Workers Launched: 2
                           ->  Nested Loop  (cost=0.71..133447.92 rows=6878 width=45) (actual time=0.222..27.746 rows=767 loops=3)
                                 ->  Parallel Index Scan Backward using argo_archived_workflows_i4 on argo_archived_workflows argo_archived_workflows_1  (cost=0.29..102561.01 rows=41667 width=65) (actual time=0.051..7.306 rows=2262 loops=3)
                                       Filter: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((instanceid)::text = ''::text) AND ((namespace)::text = 'bs98pxsfhs9z'::text))
                                 ->  Index Scan using argo_archived_workflows_labels_pkey on argo_archived_workflows_labels  (cost=0.42..0.74 rows=1 width=57) (actual time=0.009..0.009 rows=0 loops=6787)
                                       Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = (argo_archived_workflows_1.uid)::text) AND ((name)::text = 'workflows.argoproj.io/phase'::text))
                                       Filter: ((value)::text = 'Succeeded'::text)
                                       Rows Removed by Filter: 1
   ->  Index Scan using argo_archived_workflows_pkey on argo_archived_workflows  (cost=0.42..7.29 rows=1 width=1801) (actual time=0.005..0.005 rows=1 loops=1000)
         Index Cond: (((clustername)::text = 'cx5j6rrzm5vnkfqqr4d'::text) AND ((uid)::text = ("ANY_subquery".uid)::text))
 Planning Time: 2.211 ms
 Execution Time: 1213.592 ms
(19 rows)

My best guess is that the performance is highly label-dependent. Could you please run the following query so we can get some stats?

select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;

@agilgur5
Copy link
Contributor Author

agilgur5 commented Oct 8, 2024

My best guess is that the performance is highly label-dependent

all of which are associated with a single namespace (bs98pxsfhs9z):

The namespace is also possibly a confounding variable. The DB version is another (more optimizations, engines, etc in newer versions).

However, the "new" query from #13566 is much slower for 1,000 rows, taking ~1250ms:

Interesting. I'd expect it to lack some performance as it adds a subquery, but did not expect a contradictory result.
That being said, the original goal of this issue (per its description) is to replace that with a union index anyway.

Also hopefully @ryancurrah will see this soon and add some reproducible color
The EXPLAIN logs in the original issue might help as well.

@ryancurrah
Copy link
Contributor

In our deployment we only use one namespace for workflows, the workflows have around 10 labels, 7 of the labels are ones we added the rest are from argo.

Result of:

select avg(octet_length(workflow::text)), avg(pg_column_size(workflow)) from argo_archived_workflows;

"avg" "avg-2"
69368.494313104989 12034.487900223380

Result of:

select count(*) total_labels, avg(labels.count) average_labels_per_workflow, avg(labels.avg) average_label_size from (select count(*), avg(octet_length(name)) from argo_archived_workflows_labels group by uid) labels;

"total_labels" "average_labels_per_workflow" "average_label_size"
107440 12.6245904690990320 21.3524800539094153

Hope this helps.

MasonM added a commit to MasonM/argo-workflows that referenced this issue Oct 16, 2024
This adds basic benchmarks for listing and counting archived workflows
so we can evaluate potential optimizations, e.g. argoproj#13601

```
$ make BenchmarkWorkflowArchive
GIT_COMMIT=f218fc540367840d013a99642590d3509560de51 GIT_BRANCH=feat-postgresql-jsonb GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
go test --tags api,cli,cron,executor,examples,corefunctional,functional,plugins ./test/e2e -run='BenchmarkWorkflowArchive' -benchmem -bench .
WARN[0000] Non-transient error: <nil>
WARN[0000] Non-transient error: <nil>
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
BenchmarkWorkflowArchive/ListWorkflows-12                      6         167109091 ns/op          527468 B/op       8614 allocs/op
--- BENCH: BenchmarkWorkflowArchive/ListWorkflows-12
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
BenchmarkWorkflowArchive/CountWorkflows-12                    31          36799882 ns/op            9022 B/op        212 allocs/op
--- BENCH: BenchmarkWorkflowArchive/CountWorkflows-12
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
        ... [output truncated]
PASS
ok      github.com/argoproj/argo-workflows/v3/test/e2e  3.392s
```

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this issue Oct 16, 2024
This adds basic benchmarks for listing and counting archived workflows
so we can evaluate potential optimizations, e.g. argoproj#13601

```
$ make BenchmarkWorkflowArchive
GIT_COMMIT=f218fc540367840d013a99642590d3509560de51 GIT_BRANCH=feat-postgresql-jsonb GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
go test --tags api,cli,cron,executor,examples,corefunctional,functional,plugins ./test/e2e -run='BenchmarkWorkflowArchive' -benchmem -bench .
WARN[0000] Non-transient error: <nil>
WARN[0000] Non-transient error: <nil>
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
BenchmarkWorkflowArchive/ListWorkflows-12                      6         167109091 ns/op          527468 B/op       8614 allocs/op
--- BENCH: BenchmarkWorkflowArchive/ListWorkflows-12
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
    workflow_archive_test.go:27: Found 100 workflows
BenchmarkWorkflowArchive/CountWorkflows-12                    31          36799882 ns/op            9022 B/op        212 allocs/op
--- BENCH: BenchmarkWorkflowArchive/CountWorkflows-12
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
    workflow_archive_test.go:37: Found 100756 workflows
        ... [output truncated]
PASS
ok      github.com/argoproj/argo-workflows/v3/test/e2e  3.392s
```

Signed-off-by: Mason Malone <[email protected]>
@MasonM
Copy link
Contributor

MasonM commented Oct 17, 2024

Okay, I spent a lot of time looking into this, and I'm pretty sure the main problem is a TOAST issue (as I alluded to #13601 (comment)) introduced in #12912. That PR changed the query used to list workflows from select workflow from argo_archived_workflows... to

select name, namespace, uid, phase, startedat, finishedat, coalesce((workflow::json)->'metadata'->>'labels', '{}') as labels, coalesce((workflow::json)->'metadata'->>'annotations', '{}') as annotations, coalesce((workflow::json)->'status'->>'progress', '') as progress from argo_archived_workflows ...

While this change reduces the amount of data that needs to be transferred, it still significantly reduces performance for users with a large number of workflows due to TOAST. The article Jsonb: few more stories about the performance has a good explanation:

At the first glance there is no significant differense between them, but this impression is wrong. In the first case data is detoasted just once, everything is fine. Unfortunately, in the second case data is detoasted twice for both keys. And it’s like that not only for jsonb, but for all composite data types in PostgreSQL. In general, it’s not a big issue, but if our document is quite big and we’re trying to extract too many keys, we can end up with huge overhead.

In this case, the workflow column is being detoasted three times. If the average workflow is small, the performance impact is negligible, but it can become very high if you are storing 50KB+ workflows.

Reverting #12912 solves this, which I verified with the following steps:

  1. Use test: simple DB CLI for local development #13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

    $ time go run ./hack/db fake-archived-workflows  --template "@very-large-workflow.yaml" --rows 100000
    Using seed 1935828722624432788
    Clusters: [default]
    Namespaces: [argo]
    Inserted 100000 rows
    
    real    18m35.316s
    user    3m2.447s
    sys     0m44.972s
    
  2. Run the benchmarks using test: basic benchmarks for workflow archive DB operations #13767:

    make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
    
  3. Checkout this branch: main...MasonM:argo-workflows:revert-12912

  4. Re-run benchmarks:

    make BenchmarkWorkflowArchive > postgres_revert_10000_workflows.txt
    
  5. Compare results using benchstat:

    $ benchstat postgres_before_10000_workflows.txt postgres_revert_10000_workflows.txt
    goos: linux
    goarch: amd64
    pkg: github.com/argoproj/argo-workflows/v3/test/e2e
    cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                         │ postgres_before_10000_workflows.txt │  postgres_revert_10000_workflows.txt  │
                                                         │               sec/op                │    sec/op     vs base                 │
    WorkflowArchive/ListWorkflows-12                                             184.47m ± ∞ ¹   47.90m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                         184.23m ± ∞ ¹   46.70m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                             13.35m ± ∞ ¹   12.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
    geomean                                                                       76.84m         30.74m        -59.99%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05
    

However, reverting #12912 could re-introduce the memory issues mentioned in that PR, and will probably worsen query performance for MySQL (though I haven't tested that).

Migrating the workflow column to be of type jsonb is the biggest thing I could identify to improve performance. It doesn't eliminate the detoast overhead, but JSON operators are so much faster on jsonb that it doesn't matter. The only downside is it takes a long time to migrate if you have a lot of workflows. I'll enter a followup PR with more details, but I figured I'd get my findings out first. EDIT: Here's the PR: #13779

MasonM added a commit to MasonM/argo-workflows that referenced this issue Oct 17, 2024
…j#13601

As explained in
argoproj#13601 (comment),
I believe argoproj#12912
introduced a performance regression when listing workflows for
PostgreSQL users. Reverting that PR could re-introduce the memory issues
mentioned in the PR description, so instead this mitigates the impact by
converting the `workflow` column to be of type `jsonb`.

Initially `workflow` was of type `text`, and was migrated to `json` in
argoproj#2152. I'm not sure why
`jsonb` wasn't chosen, but [based on this comment in the linked
issue](argoproj#2133 (comment)),
I think it was simply an oversight. Here's the relevant docs
(https://www.postgresql.org/docs/current/datatype-json.html):

> The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.
>
> Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
>
> In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

I'm pretty sure we don't care about key order or whitespace. We do care
somewhat about insertion speed, but archived workflows are read much
more frequently than written, so a slight reduction in write speed that
gives a large improvement in read speed is a good tradeoff.

Here's steps to test this:
1. Use argoproj#13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

     ```
     $ time go run ./hack/db fake-archived-workflows  --template "@very-large-workflow.yaml" --rows 100000
     Using seed 1935828722624432788
     Clusters: [default]
     Namespaces: [argo]
     Inserted 100000 rows

     real    18m35.316s
     user    3m2.447s
     sys     0m44.972s
     ```
2. Run the benchmarks using argoproj#13767:
    ```
    make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
    ```
3. Run the migration the DB CLI:
    ```
    $ time go run ./hack/db migrate
    INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
    INFO[0000] applying database change                      change="alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb" changeSchemaVersion=60
    2024/10/17 18:07:42     Session ID:     00001
            Query:          alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
            Stack:
                    fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673
                    fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756
                    fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208
                    fmt.Append@/usr/local/go/src/fmt/print.go:289
                    log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261
                    log.(*Logger).output@/usr/local/go/src/log/log.go:238
                    log.(*Logger).Print@/usr/local/go/src/log/log.go:260
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.ansiSQLChange.apply@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/ansi_sql_change.go:11
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:295
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:284
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.Exec@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:273
                    main.NewMigrateCommand.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:50
                    github.com/spf13/cobra.(*Command).execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
                    github.com/spf13/cobra.(*Command).ExecuteC@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117
                    github.com/spf13/cobra.(*Command).Execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
                    main.main@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:39
                    runtime.main@/usr/local/go/src/runtime/proc.go:272
                    runtime.goexit@/usr/local/go/src/runtime/asm_amd64.s:1700
            Rows affected:  0
            Error:          upper: slow query
            Time taken:     69.12755s
            Context:        context.Background

    real    1m10.087s
    user    0m1.541s
    sys     0m0.410s
    ```
2. Re-run the benchmarks:
    ```
    make BenchmarkWorkflowArchive > postgres_after_10000_workflows.txt
    ```
4. Compare results using [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat):
    ```
    $ benchstat postgres_before_10000_workflows3.txt postgres_after_10000_workflows2.txt
    goos: linux
    goarch: amd64
    pkg: github.com/argoproj/argo-workflows/v3/test/e2e
    cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                         │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                         │                sec/op                │    sec/op     vs base                 │
    WorkflowArchive/ListWorkflows-12                                              183.83m ± ∞ ¹   24.69m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                          192.71m ± ∞ ¹   25.87m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                              13.04m ± ∞ ¹   11.75m ± ∞ ¹        ~ (p=1.000 n=1) ²
    geomean                                                                        77.31m         19.58m        -74.68%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05

                                                         │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                         │                 B/op                 │     B/op       vs base                │
    WorkflowArchive/ListWorkflows-12                                              497.2Ki ± ∞ ¹   497.5Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                          503.1Ki ± ∞ ¹   503.9Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                             8.972Ki ± ∞ ¹   8.899Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    geomean                                                                       130.9Ki         130.7Ki        -0.20%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05

                                                         │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt  │
                                                         │              allocs/op               │  allocs/op    vs base                │
    WorkflowArchive/ListWorkflows-12                                               8.373k ± ∞ ¹   8.370k ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                           8.410k ± ∞ ¹   8.406k ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                               212.0 ± ∞ ¹    212.0 ± ∞ ¹       ~ (p=1.000 n=1) ³
    geomean                                                                        2.462k         2.462k        -0.03%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05
    ³ all samples are equal
    ```

Signed-off-by: Mason Malone <[email protected]>
@kodieg
Copy link

kodieg commented Oct 18, 2024

One option I found on stack overflow: https://stackoverflow.com/questions/65820380/improving-performance-of-postgres-jsonb-queries-combined-with-relational-queries

I did quick tests of few variations of queries

EXPLAIN ANALYZE
WITH tab as (SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       (workflow::jsonb)->'metadata' as metadata,
       (workflow::jsonb)->'status' as status,
       (workflow::jsonb)->'spec' as spec
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
       coalesce(metadata->>'labels', '{}') AS labels,
       coalesce(metadata->>'annotations', '{}') AS annotations,
       coalesce(status->>'progress', '') AS progress,
       coalesce(metadata->>'creationTimestamp', '') AS creationtimestamp,
       spec->>'suspend' AS suspend,
       coalesce(status->>'message', '') AS message,
       coalesce(status->>'estimatedDuration', '0') AS estimatedduration,
       coalesce(status->>'resourcesDuration', '{}') AS resourcesduration
FROM tab                                                                                                                                                                                                                                                                                                          

and

EXPLAIN ANALYZE
WITH tab as (SELECT name,
       namespace,
       UID,
       phase,
       startedat,
       finishedat,
       (workflow::jsonb) as workflow
FROM "argo_archived_workflows"
LIMIT 1000
)
SELECT namespace, UID, phase, startedat, finishedat,
       coalesce((workflow)->'metadata'->>'labels', '{}') AS labels,
       coalesce((workflow)->'metadata'->>'annotations', '{}') AS annotations,
       coalesce((workflow)->'status'->>'progress', '') AS progress,
       coalesce((workflow)->'metadata'->>'creationTimestamp', '') AS creationtimestamp,
       (workflow)->'spec'->>'suspend' AS suspend,
       coalesce((workflow)->'status'->>'message', '') AS message,
       coalesce((workflow)->'status'->>'estimatedDuration', '0') AS estimatedduration,
       coalesce((workflow)->'status'->>'resourcesDuration', '{}') AS resourcesduration
FROM tab

First one worked better for jsonb column, second allowed to mitigate case of having json type.

@MasonM could you maybe add those queries to your benchmarks?

@MasonM
Copy link
Contributor

MasonM commented Oct 19, 2024

@kodieg Thanks! The first query (with the CTE) is indeed a lot faster when I test it on PostgreSQL (~40% with json and ~60% with jsonb). I can enter a PR with more details once #13767 is merged (trying to juggle all these branches is getting to be a bit of a hassle).

@tooptoop4
Copy link
Contributor

#9516 might help

@MasonM
Copy link
Contributor

MasonM commented Oct 26, 2024

As promised, I entered #13819 with the optimization @kodieg found, as well as optimizations for MySQL.

@agilgur5
Copy link
Contributor Author

agilgur5 commented Oct 26, 2024

Awesome work Mason 🚀 ❤️

And thanks @kodieg for helping find a more optimal query!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Argo Server API area/workflow-archive P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants