Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheusremotewrite] Parse labels #35656

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented Oct 7, 2024

Description

This PR builds on top of #35535, #35565 and #35624.

Here we're parsing labels into resource/metric attributes. It's still not great because resource attributes (with exception to service.namespace, service.name and service.name.id) are encoded into a special metric called target_info. Metrics related to specific target infos may arrive in separate write requests, so it may be impossible to build the full OTLP metric in a stateless way.

In this PR I'm ignoring this problem 😛, and transforming job and instance labels into resource attributes, while all other labels become scope attributes.

Please focus on the latest commit when reviewing this PR :)
1c9ff80

@dashpole
Copy link
Contributor

dashpole commented Oct 9, 2024

I don't think we should be putting metric labels into the resource attributes.

@ArthurSens
Copy link
Member Author

I don't think we should be putting metric labels into the resource attributes.

I was in doubt about this one. Since there are three layers of attributes, how do we know where labels should go when transformed into OpenTelemetry metrics?

@dashpole
Copy link
Contributor

I was in doubt about this one. Since there are three layers of attributes, how do we know where labels should go when transformed into OpenTelemetry metrics?

Job + Info become service.* labels. Other labels on prom metrics become labels on the otel metric. Labels on target_info become attributes on the resource. The tricky part is that target_info might end up in a different request? We can deal with that in a follow-up

@ArthurSens
Copy link
Member Author

The tricky part is that target_info might end up in a different request? We can deal with that in a follow-up

Oof, I honestly haven't thought about that. This makes this receiver a lot more difficult to write 😬

@ArthurSens ArthurSens marked this pull request as draft October 30, 2024 13:55
@ArthurSens ArthurSens changed the title [receiver/prometheusremotewrite] Parse labels into resource attributes [receiver/prometheusremotewrite] Parse labels Oct 30, 2024
@ArthurSens ArthurSens marked this pull request as ready for review October 30, 2024 15:37
@ArthurSens
Copy link
Member Author

ArthurSens commented Oct 30, 2024

Alright, PR updated to only handle job and instance as resource attributes. All the rest becomes scope attributes (I couldn't find attributes in the metrics object 🤔).


scopeAttributes := rm.ScopeMetrics().AppendEmpty().Scope().Attributes()
for _, l := range ls {
if l.Name == "instance" || l.Name == "job" || l.Name == labels.MetricName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also omit otel_scope_name and otel_scope_version, which should be used to set the name and version of the scope (if present).

@ArthurSens
Copy link
Member Author

Sorry for the massive review request folks, accidentally messed up my commits 😭

@ArthurSens ArthurSens force-pushed the prwreceiver-parselabels branch 3 times, most recently from f902047 to b1e96fa Compare November 7, 2024 00:38
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [api, user]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any changes to the public API of the package.

Suggested change
change_logs: [api, user]
change_logs: []

return pmetric.NewMetrics(), promremote.WriteResponseStats{}, nil
func (prw *prometheusRemoteWriteReceiver) translateV2(_ context.Context, req *writev2.Request) (pmetric.Metrics, promremote.WriteResponseStats, error) {
var (
badRequestErrors []error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use a single error, and use errors.Join to append new errors? e.g.

var err error
// then, when we want to append an error:
err = errors.Join(err, fmt.Errorf("missing metric name in labels"))

var (
badRequestErrors []error
otelMetrics = pmetric.NewMetrics()
b = labels.NewScratchBuilder(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use a more descriptive name? Perhaps labelsBuilder?

@@ -39,8 +42,9 @@ type prometheusRemoteWriteReceiver struct {
settings receiver.Settings
nextConsumer consumer.Metrics

config *Config
server *http.Server
jobInstanceCache map[string]pmetric.ResourceMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using an int64 as the key for this cache, and constructing an identifier using xxhash, similar to what prometheus translation does: https://github.com/prometheus/prometheus/blob/0f0deb77a2f15383914ad60a3309281adb7e8b27/storage/remote/otlptranslator/prometheusremotewrite/helper.go#L94

You can just append the job, a separator, and then the instance.

@@ -39,8 +42,9 @@ type prometheusRemoteWriteReceiver struct {
settings receiver.Settings
nextConsumer consumer.Metrics

config *Config
server *http.Server
jobInstanceCache map[string]pmetric.ResourceMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any mechanism to expire items in this cache? We don't want this to leak memory.

Actually, i'm not sure we should be storing the job + instance cache on the receiver itself. We probably want this cache around for a single PRW request, right? Otherwise, we will keep appending new metrics to the same resourceMetrics forever.

// Following the specification at https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/
func parseJobAndInstance(dest pcommon.Map, instance, job string) {
if job != "" {
dest.PutStr("service.namespace", job)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job should be service.name + "/" + service.namespace. Instance is only service.instance.id. You need to move strings.Split(instance, "/") to the job case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants