Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Output Events Rate" in stack monitoring is always zero #8383

Open
axw opened this issue Jun 15, 2022 · 12 comments · Fixed by #8900
Open

"Output Events Rate" in stack monitoring is always zero #8383

axw opened this issue Jun 15, 2022 · 12 comments · Fixed by #8900
Labels
Milestone

Comments

@axw
Copy link
Member

axw commented Jun 15, 2022

APM Server version (apm-server version): 8.3.0-BC4

Description of the problem including expected versus actual behavior:

"Output Events Rate" in stack monitoring is always zero.

Steps to reproduce:

  1. Start 8.3.0-BC4 with stack monitoring enabled.
  2. Send some events, check that they show up in the APM UI.
  3. Navigate to stack monitoring, observe the "Output Events Rate" chart is always reporting zero.

image

@axw axw added the bug label Jun 15, 2022
@axw
Copy link
Member Author

axw commented Jun 15, 2022

Hmm, I just reconfigured the integration with expvar enabled, and now it's working. Maybe there's race condition?

@axw
Copy link
Member Author

axw commented Jun 15, 2022

Happened again after upgrading from 8.2.3 to 8.3.0-BC4. Initially the output was zero, after reconfiguring the integration (this time changing the event rate limit), the output went non-zero.

@axw
Copy link
Member Author

axw commented Aug 31, 2022

This is apparently still an issue, at least in system tests, as seen here:

https://apm-ci.elastic.co/blue/organizations/jenkins/apm-server%2Fapm-server-mbp%2FPR-9014/detail/PR-9014/1/pipeline/

@axw axw reopened this Aug 31, 2022
@lahsivjar
Copy link
Contributor

I haven't been able to reproduce this exact error. However, due to the way our instrumentation works it is possible that after a reload event the old modelindexer is still receiving data while the instrumentation has moved to the new modelindexer. This is due to the fact that we wait for the old modelindexer to gracefully shutdown however, we switch the monitoring to new modelindexer before the old one exits.

The above will result in the instrumentation data to report 0 until the old indexer shuts down.

@axw axw mentioned this issue Sep 23, 2022
9 tasks
@simitt simitt modified the milestones: 8.5, 8.6 Sep 23, 2022
@simitt simitt added the v8.6.0 label Oct 4, 2022
@simitt
Copy link
Contributor

simitt commented Nov 22, 2022

Moving this to backlog since we haven't spend more time recently to track this down.

@tegenterter
Copy link

It appears that this bug lead up to an incident (https://github.com/elastic/cloud/issues/110723) and should be prioritized

@simitt simitt modified the milestones: 8.6, 8.7 Dec 23, 2022
@simitt
Copy link
Contributor

simitt commented Dec 23, 2022

Moved it into the 8.7 milestone again to be picked up and verified if this is still a bug in current versions.

@simitt simitt removed this from the 8.7 milestone Feb 24, 2023
@simitt simitt added this to the 8.8 milestone Feb 24, 2023
@simitt simitt removed the v8.6.0 label Feb 24, 2023
@simitt simitt removed the blocked label Mar 9, 2023
@axw
Copy link
Member Author

axw commented Apr 4, 2023

I don't recall if this has already been ruled out, but I realise now that I never wrote down on this issue a possible contributing factor: every time we reconfigure the server, we create a new libbeat monitoring registry:

libbeatMonitoringRegistry := monitoring.Default.NewRegistry("libbeat")

@lahsivjar
Copy link
Contributor

Hmm, nice catch. I don't remember any conversation around this so I think this hasn't been ruled out.

@simitt simitt modified the milestones: 8.8, 8.9 Apr 27, 2023
@endorama
Copy link
Member

endorama commented May 2, 2023

I was looking at this today and I have 2 questions:

  1. how can I send some test data?
  2. my first hint at this would be to try reusing the libbeatMonitoringRegistry instead of creating it anew like it is done for the output registry
    outputRegistry := stateRegistry.GetRegistry("output")
    if outputRegistry != nil {
    outputRegistry.Clear()
    } else {
    outputRegistry = stateRegistry.NewRegistry("output")
    }
    What do you think?

@axw
Copy link
Member Author

axw commented May 8, 2023

how can I send some test data?

You could use https://github.com/elastic/apm-server/tree/main/systemtest/cmd/sendotlp to send test data to APM Server

my first hint at this would be to try reusing the libbeatMonitoringRegistry instead of creating it anew like it is done for the output registry

You could try, but I don't think that will work. There are assumptions about there being a 1:1 relationship between metrics and outputs, e.g. here:

// Install our own libbeat-compatible metrics callback which uses the docappender stats.
// All the metrics below are required to be reported to be able to display all relevant
// fields in the Stack Monitoring UI.
monitoring.NewFunc(libbeatMonitoringRegistry, "output.write", func(_ monitoring.Mode, v monitoring.Visitor) {
v.OnRegistryStart()
defer v.OnRegistryFinished()
v.OnKey("bytes")
v.OnInt(appender.Stats().BytesTotal)
})

@carsonip
Copy link
Member

#14337 may be related in a sense that reloads (which may take a long time) need to be handled carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants