Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractIntakeApiHandler - Change log level for connection error #3593

Closed
NicklasWallgren opened this issue Apr 20, 2024 · 2 comments
Closed
Labels
agent-java community Issues and PRs created by the community triage

Comments

@NicklasWallgren
Copy link
Contributor

NicklasWallgren commented Apr 20, 2024

The AbstractIntakeApiHandler has support for retryability and backoff - hence I think it would be a good idea to change the log level at

logger.error("Error trying to connect to APM Server at {}. Although not necessarily related to SSL, some related SSL " +
to WARN for the first couple of connection issues, and then fallback to using ERROR. What do you think?

Same goes for IntakeV2ReportingEventHandler.

2024-04-20 18:54:21,338 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.AbstractIntakeApiHandler - Error trying to connect to APM Server at http://127.0.0.1:8200/intake/v2/events. Although not necessarily related to SSL, some related SSL configurations corresponding the current connection are logged at INFO level.

2024-04-20 18:54:21,338 [elastic-apm-server-reporter] INFO  co.elastic.apm.agent.report.AbstractIntakeApiHandler - Backing off for 36 seconds (+/-10%)
@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Apr 20, 2024
@SylvainJuge
Copy link
Member

Hi @NicklasWallgren ,

If I understand it correctly, you'd like to have the following:

  • for the first few occurrences of a connection issue: just issue a warning.
  • if the connection issues persist, then issue an error.

The problem I see here is that it should be considered an ERROR or just a WARNING is very context-sensitive and will depend on the application and the user expectations, so it is very hard to come with a common rule. For example when the apm server can't be reached some applications have a very light load and can buffer for a while, while others would have lots of traffic and will drop data very quickly.

In addition, when querying log messages, a filter on the log level is often applied first, and unless the query is based on the log message it could become confusing to have the same message reported with two different log levels or even hide the WARN/ERROR occurrences if the user is focused on the other ERROR/WARN level occurences.

So here I think it would not be worth modifying the current behavior and keep this as an ERROR. If however you are in an environment where such error messages are too frequent then that's very likely a symptom of a potential issue like apm-server high load or a network issue which should not be ignored.

@NicklasWallgren
Copy link
Contributor Author

I agree, lets keep this as an ERROR. Thank for the thorough reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community triage
Projects
None yet
Development

No branches or pull requests

2 participants