Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obs > APM > Settings > Create Agent Configuration: only allows setting two of many central config vars if no service name info #196958

Open
trentm opened this issue Oct 18, 2024 · 3 comments
Labels
apm:settings apm bug Fixes for quality problems that affect the customer experience sdh-linked Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team

Comments

@trentm
Copy link
Member

trentm commented Oct 18, 2024

Kibana version: v8.15.1

Elasticsearch version: v8.15.1

Server OS version: Elastic cloud deployment in GCP us-west-1

Browser version: Firefox 131.0.3 (aarch64)

Browser OS version: macOS

Original install method (e.g. download page, yum, from source, etc.): cloud deployment

Describe the bug:

If I attempt to create an APM Agent Configuration with Service name: All (and Environment: All), then the Kibana UI only offers a way to set two of the settings (transaction_max_spans and transaction_sample_rate,

// Transaction max spans
{
key: 'transaction_max_spans',
type: 'integer',
min: 0,
defaultValue: '500',
label: i18n.translate('xpack.apm.agentConfig.transactionMaxSpans.label', {
defaultMessage: 'Transaction max spans',
}),
description: i18n.translate('xpack.apm.agentConfig.transactionMaxSpans.description', {
defaultMessage: 'Limits the amount of spans that are recorded per transaction.',
}),
excludeAgents: ['js-base', 'rum-js', 'android/java', 'iOS/swift'],
},
// Transaction sample rate
{
key: 'transaction_sample_rate',
type: 'float',
defaultValue: '1.0',
label: i18n.translate('xpack.apm.agentConfig.transactionSampleRate.label', {
defaultMessage: 'Transaction sample rate',
}),
description: i18n.translate('xpack.apm.agentConfig.transactionSampleRate.description', {
defaultMessage:
'By default, the agent will sample every transaction (e.g. request to your service). To reduce overhead and storage requirements, you can set the sample rate to a value between 0.0 and 1.0. We still record overall time and the result for unsampled transactions, but not context information, labels, or spans.',
}),
excludeAgents: ['android/java', 'iOS/swift'],
},
). These are just two of the many possible agent configuration settings that may or may not apply to a given service, depending on the language of the request APM agent.

Image

The same thing happens if I manually enter a Service name: ... value that isn't in the populated menu list of known service names with recent data.

Image

I did have a couple services with recent data, so this wasn't a completely empty deployment.

Expected behavior:

I would expect to be able to see some (all?) of the other configuration vars. I understand that these agent_configuration settings have excludeAgents and includeAgents fields used to limit the presented config settings to those relevant for the language agent for the given service. However, if the target is "All" service names, or an unknown one, then it is limiting to only allow a subset of the config settings.

The motivating case for my reporting this issue is a user that was not getting APM data for a service at all (or at least not for a long while). One theory for not receiving transaction data was that the sampling flag in incoming HTTP request traceparent headers was resulting in all transactions being discarded. A possible solution of this would be to use the trace_continuation_strategy config var. However, because of this issue one cannot create an Agent Configuration that has a value for trace_continuation_strategy.

Errors in browser console (if relevant): I did not see any, and I don't think it is relevant.

Any additional context:

One guess as to why those particular two settings are always shown is that they use excludeAgents rather than includeAgents. However, there is a 3rd setting that also uses excludeAgents and it is not included:

  {
    key: 'span_frames_min_duration',
    type: 'duration',
    min: '-1ms',
    defaultValue: '5ms',
    label: i18n.translate('xpack.apm.agentConfig.spanFramesMinDuration.label', {
      defaultMessage: 'Span frames minimum duration',
    }),
    description: i18n.translate('xpack.apm.agentConfig.spanFramesMinDuration.description', {
      defaultMessage:
        '(Deprecated, use `span_stack_trace_min_duration` instead!) In its default settings, the APM agent will collect a stack trace with every recorded span.\nWhile this is very helpful to find the exact place in your code that causes the span, collecting this stack trace does have some overhead. \nWhen setting this option to a negative value, like `-1ms`, stack traces will be collected for all spans. Setting it to a positive value, e.g. `5ms`, will limit stack trace collection to spans with durations equal to or longer than the given value, e.g. 5 milliseconds.\n\nTo disable stack trace collection for spans completely, set the value to `0ms`.',
    }),
    excludeAgents: ['js-base', 'rum-js', 'nodejs', 'php', 'android/java', 'iOS/swift'],
  },

If Kibana behaviour were to change here to show all possible config vars if the target APM agent language was unknown, then possibly it would be nice if the UX changed to show the excludeAgents and includeAgents values in some form to give the user at least a start at knowing which config settings would be applicable. Yes, this might open a bit of a can of worms for users expecting a certain config setting to work for a language that doesn't support it.

If there is a concern that passing a config setting to a language agent that doesn't support it could cause harm: the APM agents spec requires that APM agents ignore central config settings they don't know. https://github.com/elastic/apm/blob/main/specs/agents/configuration.md#dealing-with-errors

If the agent receives a known but invalid config attribute, it should log a warning such as:
Central config failure. Invalid value for transactionSampleRate: 1.2 (out of range [0,1.0])
Failure to process one config attribute should not affect processing of others.

@trentm trentm added bug Fixes for quality problems that affect the customer experience sdh-linked labels Oct 18, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Oct 18, 2024
@smith smith added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Oct 18, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Oct 18, 2024
@roshan-elastic
Copy link

Thanks for raising @trentm.

Hey @smith - what do you think of this?

Is this an issue we've seen before?

@smith
Copy link
Contributor

smith commented Oct 21, 2024

@roshan-elastic I haven't seen this problem before but I'm somewhat familiar with the code, and this looks like a bug we should investigate and fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:settings apm bug Fixes for quality problems that affect the customer experience sdh-linked Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants