Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate nightly benchmarks 0 events/s issue #13738

Open
carsonip opened this issue Jul 22, 2024 · 3 comments
Open

Investigate nightly benchmarks 0 events/s issue #13738

carsonip opened this issue Jul 22, 2024 · 3 comments
Assignees
Labels

Comments

@carsonip
Copy link
Member

Nightly benchmarks occasionally report 0 events/s. Investigate the root cause of it.

@carsonip carsonip added the bug label Jul 22, 2024
@lahsivjar lahsivjar self-assigned this Jul 22, 2024
@lahsivjar
Copy link
Contributor

lahsivjar commented Jul 29, 2024

Status update

The first thing I looked at was what was getting reported by the benchmark failures. Here are 2 links to the benchmark run:

  1. Run with events/sec metric populated - Link to APM-Server logs - Link to deployment
  2. Run without events/sec metric populated - Link to APM-Server logs - Link to deployment

Both of these show 500 internal error, however, the logs for 0 events/sec additionally show data validation errors due to unexpected EOF. These errors seemed to be logged from here. This could be an issue with our sender, however, the most intriguing thing is why only a subset of delta metrics are reported as 0. For example: in the above link, the txn/sec and metrics/sec are reported correctly whereas other delta metrics are reported as zero.

I have tried reproducing the errors locally but haven't succeeded (note that the expvar metrics collection is designed for benchtimes in minutes so if testing locally make sure that you have a good enough benchtime to give expvar metrics to work correctly). I did see some special handling in the expvar metric collection but nothing explains this bug.

I have also created a PR to log errors in expvar endpoint which was not done before. I am not sure how helpful it will be though.

@lahsivjar lahsivjar removed their assignment Jul 29, 2024
@1pkg 1pkg removed their assignment Aug 13, 2024
@simitt
Copy link
Contributor

simitt commented Oct 16, 2024

Is this still happening?

@rubvs
Copy link
Contributor

rubvs commented Oct 23, 2024

@simitt I had this happen to me in a run on GH Actions last week, see Slack Thread: https://elastic.slack.com/archives/C95SB62AG/p1729263104854879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants