Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization of daily perf loadgen data #63

Open
mhofman opened this issue Feb 9, 2022 · 0 comments
Open

Visualization of daily perf loadgen data #63

mhofman opened this issue Feb 9, 2022 · 0 comments
Assignees
Labels
help wanted Extra attention is needed telemetry

Comments

@mhofman
Copy link
Member

mhofman commented Feb 9, 2022

Summary

This issue is about building a visualization dashboard which automatically displays the stats from the latest daily loadgen runs.

Context

The loadgen's primary use is to build a regression over time of the behavior of the SDK (see Agoric/agoric-sdk#3107). In this case time has 2 dimensions:

  • Behavior of the chain over the lifetime of the chain (performance should stay stable and not degrade)
  • Behavior of the chain when changes are introduced across revisions (performance should not become notably worse, and should hopefully get better)

The first is mostly captured by running loadgen cycles split in stages (currently 4 stages of 6 hours for the daily perf run), and comparing stages to each other. The second is captured by comparing summarized metrics between revisions (different daily perf runs).

Current tooling

Currently the stats are saved in a perf.jsonl file which contains a stream of CPU and Memory usage stats, and a final summary of all other stats. #43 deals with unifying these so that individual stats data point are outputted in the stream, and only summaries are generated at the end, possibly including summaries of the CPU and memory usage.

The visualization is done by extracting the stats summaries into a CSV file (see https://github.com/Agoric/testnet-load-generator/blob/main/scripts/perf_to_stats_csv.jq), and importing that in a Google Spreadsheet with some graphs.

Detailed requirements

We would like to have a dashboard that shows the data detailed in Agoric/agoric-sdk#3107, which is automatically updated to include the results from the latest daily run.
If a run fails, the dashboard should make it obvious or possibly send alerts. It should also alert if no data has been received recently (to highlight a stuck loadgen for example)
The dashboard does not need to show data for a in-progress loadgen, that is a separate issue (TBD)

It would be great if the dashboard allowed easily generating new graphs from the existing data, or perform queries.

@mhofman mhofman self-assigned this Feb 9, 2022
@mhofman mhofman added help wanted Extra attention is needed telemetry labels Feb 9, 2022
@mhofman mhofman changed the title Visualization of loadgen data Visualization of daily perf loadgen data Feb 9, 2022
@Tartuffo Tartuffo assigned Tartuffo and unassigned mhofman Feb 17, 2022
@Tartuffo Tartuffo added this to the Mainnet: Phase 1 - Treasury Launch milestone Mar 16, 2022
@ivanlei ivanlei removed this from the Mainnet: Phase 1 - Treasury Launch milestone May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed telemetry
Projects
None yet
Development

No branches or pull requests

3 participants