Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: simple DB CLI for local development #13715

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Oct 6, 2024

Motivation

This adds a small CLI tool for developers to use when working with the DB locally. It provides two functions:

  1. Explicitly migrate the DB (which is normally only done when the workflow controller starts) to help test migrations:

    $ go run ./hack/db migrate
    INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
    INFO[0000] applying database change                      change="CREATE INDEX argo_archived_workflows_i5 ON argo_archived_workflows (clustername, startedat)" changeSchemaVersion=60
    
  2. Insert randomly-generated archived workflows into the DB, which is intended to help test query performance (e.g. 3.5: Further optimize Archive List API call / DB query #13601)

    $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000
    Using seed 2600415997737113470
    Clusters: [cx5j6rrzm5vnkfqqr4d]
    Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
    Inserted 100000 rows
    
    real    17m56.554s
    user    1m53.833s
    sys     0m43.581s
    

    This is obviously not as efficient as it could be, but it's only intended to be run on an ad-hoc basis when manually testing query performance. To make it fast, we'd probably have to switch to direct SQL inserts, which would couple this script to the DB schema logic in persist/sqldb/workflow_archive.go.

Modifications

All logic is centralized in hack/db/main.go. I could've easily split that up, but that doesn't seem worth it, since this is only ever going to be used by developers.

Verification

Manually ran go run ./hack/db. See above for the output when using PostgreSQL. Here's the corresponding output for MySQL:

$ go run ./hack/db migrate --dsn mysql:password@tcp/argo
INFO[0000] Migrating database schema                     clusterName=default dbType=mysql

$ time go run ./hack/db/main.go fake-archived-workflows --rows 100 --dsn mysql:password@tcp/argo
Using seed 5504222375807483325
Clusters: [jckpg]
Namespaces: [7qr6rmgz5d8hz8zskt4 zst56t4wv9pkh8 jxlrvhtx4 g7bjn crk8qhxdtnddr2f]
Inserted 100 rows

real    0m3.261s
user    0m1.605s
sys     0m0.500s

This adds a small CLI tool for developers to use when working with the
DB locally. It provides two functions:
1. Explicitly migrate the DB (which is normally only done when the
   workflow controller starts) to help test migrations:

     ```
     $ go run ./hack/db migrate
     INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
     ```
2. Insert randomly-generated archived workflows into the DB, which is
   intended to help test query performance (e.g.
   argoproj#13601)

    ```
    $ time go run ./hack/db/main.go fake-archived-workflows --rows 100000
    Using seed 2600415997737113470
    Clusters: [cx5j6rrzm5vnkfqqr4d]
    Namespaces: [bs98pxsfhs9z v95848 hq76xbj4kq7vrdp49vm ghzj6vcrth 262wb8w2b8v2wd2p2p9]
    Inserted 100000 rows

    real    17m56.554s
    user    1m53.833s
    sys     0m43.581s
    ```

  This is obviously not as efficient as it could be, but it's only
  intended to be run on an adhoc basis when manually testing query
  performance. To make it fast, we'd probably have to switch to direct
  SQL inserts, which would couple this script to the logic in
  `persist/sqldb/workflow_archive.go`.

Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
@MasonM MasonM changed the title feat(cli): add DB CLI for local development feat(cli): simple DB CLI for local development Oct 7, 2024
@MasonM MasonM marked this pull request as ready for review October 7, 2024 02:22
@agilgur5 agilgur5 changed the title feat(cli): simple DB CLI for local development test: simple DB CLI for local development Oct 7, 2024
@agilgur5 agilgur5 added area/build Build or GithubAction/CI issues area/contributing Contributing docs, ownership, etc. Also devtools like devcontainer and Nix labels Oct 7, 2024
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Oct 17, 2024
With PostgreSQL, the `argo_archived_workflows.workflow` column has been
of type `json` ever since 8a1e611,
which was released as v2.5.0. Therefore, the `::json` casts do nothing,
and prevent users from improving performance by migrating to JSONB using the following query:
```sql
alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
```

Without the changes in this PR, running the above will massively slow
down the queries, because casting JSONB to JSON is expensive. With the
changes in this PR, it improves performance by ~80%, which I determined
by running the benchmarks added in
argoproj#13767 against a DB with
100,000 randomly generated workflows generated by
argoproj#13715:
```
$ benchstat postgres_before_10000_workflows.txt postgres_after_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows.txt │  postgres_after_10000_workflows.txt   │
                                                     │               sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                             185.81m ± ∞ ¹   25.06m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                         186.35m ± ∞ ¹   25.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             42.39m ± ∞ ¹   11.78m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                       113.6m         19.72m        -82.64%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

The only downside to migrating to JSONB is it can take a long time
if you've got a ton of workflows (~72s on my local DB with 100,000
workflows). I'll enter a separate PR for the migration, but I'm entering
this change separately so it can hopefully go out in 3.6.0.

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Oct 17, 2024
With PostgreSQL, the `argo_archived_workflows.workflow` column has been
of type `json` ever since 8a1e611,
which was released as v2.5.0. Therefore, the `::json` casts do nothing,
and prevent users from improving performance by migrating to JSONB using the following query:
```sql
alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
```

Without the changes in this PR, running the above will massively slow
down the queries, because casting JSONB to JSON is expensive. With the
changes in this PR, it improves performance by ~80%, which I determined
by running the benchmarks added in
argoproj#13767 against a DB with
100,000 randomly generated workflows generated by
argoproj#13715:
```
$ benchstat postgres_before_10000_workflows.txt postgres_after_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows.txt │  postgres_after_10000_workflows.txt   │
                                                     │               sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                             185.81m ± ∞ ¹   25.06m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                         186.35m ± ∞ ¹   25.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             42.39m ± ∞ ¹   11.78m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                       113.6m         19.72m        -82.64%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

The only downside to migrating to JSONB is it can take a long time
if you've got a ton of workflows (~72s on my local DB with 100,000
workflows). I'll enter a separate PR for the migration, but I'm entering
this change separately so it can hopefully go out in 3.6.0.

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Oct 17, 2024
…j#13601

As explained in
argoproj#13601 (comment),
I believe argoproj#12912
introduced a performance regression when listing workflows for
PostgreSQL users. Reverting that PR could re-introduce the memory issues
mentioned in the PR description, so instead this mitigates the impact by
converting the `workflow` column to be of type `jsonb`.

Initially `workflow` was of type `text`, and was migrated to `json` in
argoproj#2152. I'm not sure why
`jsonb` wasn't chosen, but [based on this comment in the linked
issue](argoproj#2133 (comment)),
I think it was simply an oversight. Here's the relevant docs
(https://www.postgresql.org/docs/current/datatype-json.html):

> The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.
>
> Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
>
> In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

I'm pretty sure we don't care about key order or whitespace. We do care
somewhat about insertion speed, but archived workflows are read much
more frequently than written, so a slight reduction in write speed that
gives a large improvement in read speed is a good tradeoff.

Here's steps to test this:
1. Use argoproj#13715 to generate 100,000 randomized workflows, with https://gist.github.com/MasonM/52932ff6644c3c0ccea9e847780bfd90 as a template:

     ```
     $ time go run ./hack/db fake-archived-workflows  --template "@very-large-workflow.yaml" --rows 100000
     Using seed 1935828722624432788
     Clusters: [default]
     Namespaces: [argo]
     Inserted 100000 rows

     real    18m35.316s
     user    3m2.447s
     sys     0m44.972s
     ```
2. Run the benchmarks using argoproj#13767:
    ```
    make BenchmarkWorkflowArchive > postgres_before_10000_workflows.txt
    ```
3. Run the migration the DB CLI:
    ```
    $ time go run ./hack/db migrate
    INFO[0000] Migrating database schema                     clusterName=default dbType=postgres
    INFO[0000] applying database change                      change="alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb" changeSchemaVersion=60
    2024/10/17 18:07:42     Session ID:     00001
            Query:          alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
            Stack:
                    fmt.(*pp).handleMethods@/usr/local/go/src/fmt/print.go:673
                    fmt.(*pp).printArg@/usr/local/go/src/fmt/print.go:756
                    fmt.(*pp).doPrint@/usr/local/go/src/fmt/print.go:1208
                    fmt.Append@/usr/local/go/src/fmt/print.go:289
                    log.(*Logger).Print.func1@/usr/local/go/src/log/log.go:261
                    log.(*Logger).output@/usr/local/go/src/log/log.go:238
                    log.(*Logger).Print@/usr/local/go/src/log/log.go:260
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.ansiSQLChange.apply@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/ansi_sql_change.go:11
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:295
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.applyChange@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:284
                    github.com/argoproj/argo-workflows/v3/persist/sqldb.migrate.Exec@/home/vscode/go/src/github.com/argoproj/argo-workflows/persist/sqldb/migrate.go:273
                    main.NewMigrateCommand.func1@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:50
                    github.com/spf13/cobra.(*Command).execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
                    github.com/spf13/cobra.(*Command).ExecuteC@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117
                    github.com/spf13/cobra.(*Command).Execute@/home/vscode/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
                    main.main@/home/vscode/go/src/github.com/argoproj/argo-workflows/hack/db/main.go:39
                    runtime.main@/usr/local/go/src/runtime/proc.go:272
                    runtime.goexit@/usr/local/go/src/runtime/asm_amd64.s:1700
            Rows affected:  0
            Error:          upper: slow query
            Time taken:     69.12755s
            Context:        context.Background

    real    1m10.087s
    user    0m1.541s
    sys     0m0.410s
    ```
2. Re-run the benchmarks:
    ```
    make BenchmarkWorkflowArchive > postgres_after_10000_workflows.txt
    ```
4. Compare results using [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat):
    ```
    $ benchstat postgres_before_10000_workflows3.txt postgres_after_10000_workflows2.txt
    goos: linux
    goarch: amd64
    pkg: github.com/argoproj/argo-workflows/v3/test/e2e
    cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                         │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                         │                sec/op                │    sec/op     vs base                 │
    WorkflowArchive/ListWorkflows-12                                              183.83m ± ∞ ¹   24.69m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                          192.71m ± ∞ ¹   25.87m ± ∞ ¹        ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                              13.04m ± ∞ ¹   11.75m ± ∞ ¹        ~ (p=1.000 n=1) ²
    geomean                                                                        77.31m         19.58m        -74.68%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05

                                                         │ postgres_before_10000_workflows3.txt │  postgres_after_10000_workflows2.txt  │
                                                         │                 B/op                 │     B/op       vs base                │
    WorkflowArchive/ListWorkflows-12                                              497.2Ki ± ∞ ¹   497.5Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                          503.1Ki ± ∞ ¹   503.9Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                             8.972Ki ± ∞ ¹   8.899Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
    geomean                                                                       130.9Ki         130.7Ki        -0.20%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05

                                                         │ postgres_before_10000_workflows3.txt │ postgres_after_10000_workflows2.txt  │
                                                         │              allocs/op               │  allocs/op    vs base                │
    WorkflowArchive/ListWorkflows-12                                               8.373k ± ∞ ¹   8.370k ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/ListWorkflows_with_label_selector-12                           8.410k ± ∞ ¹   8.406k ± ∞ ¹       ~ (p=1.000 n=1) ²
    WorkflowArchive/CountWorkflows-12                                               212.0 ± ∞ ¹    212.0 ± ∞ ¹       ~ (p=1.000 n=1) ³
    geomean                                                                        2.462k         2.462k        -0.03%
    ¹ need >= 6 samples for confidence interval at level 0.95
    ² need >= 4 samples to detect a difference at alpha level 0.05
    ³ all samples are equal
    ```

Signed-off-by: Mason Malone <[email protected]>
MasonM added a commit to MasonM/argo-workflows that referenced this pull request Oct 25, 2024
This adds some additional details about
argoproj#13779, primarily to assist
users with a very large number of archived workflows who may not be comfortable
with extended downtime.

I tested the manual migration strategy locally after populating my local
database with 100,000 rows using
argoproj#13715:

```
$ time make postgres-cli < migrate_before.sql
GIT_COMMIT=f25960ac6f63bc3cf1a3b8a9813cf43c61921262 GIT_BRANCH=fix-migrate-jsonb GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
kubectl exec -ti svc/postgres -- psql -U postgres
Unable to use a TTY - input is not a terminal or the right kind of file
ALTER TABLE
CREATE FUNCTION
CREATE TRIGGER
UPDATE 100000

real    1m16.118s
user    0m2.639s
sys     0m0.548s

$ time make postgres-cli < migrate_after.sql
GIT_COMMIT=49ff7a44ba307416282a1f5cd3b844d19bce7f88 GIT_BRANCH=main GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=false VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/amd64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=true ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
kubectl exec -ti svc/postgres -- psql -U postgres
Unable to use a TTY - input is not a terminal or the right kind of file
BEGIN
LOCK TABLE
DROP TRIGGER
ALTER TABLE
ALTER TABLE
ALTER TABLE
COMMIT

real    0m1.503s
user    0m2.786s
sys     0m0.540s
```

Signed-off-by: Mason Malone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build or GithubAction/CI issues area/contributing Contributing docs, ownership, etc. Also devtools like devcontainer and Nix area/workflow-archive
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants