Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Implement sanitizers to operate on OTLP data #5551

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions cmd/jaeger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,9 @@

This is experimental Jaeger V2 based on OpenTelemetry collector.
See https://github.com/jaegertracing/jaeger/issues/4843.

## Compatibility

### Service Name Sanitizer

In v1, there was a `serviceNameSanitizer` that sanitized the service names in span annotations using a source of truth alias to service cache. This functionality has been removed in v2. If your implementation relies on this sanitizer, you will need to find a different way to integrate this functionality, such as implementing a custom processor.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"go.opentelemetry.io/collector/pdata/ptrace"
)

// Constants for the replacement names
const (
emptyServiceName = "empty-service-name"
missingServiceName = "missing-service-name"
)

// NewEmptyServiceNameSanitizer returns a function that replaces empty service names
// with a predefined string.
func NewEmptyServiceNameSanitizer() SanitizeTraces {
return sanitizeEmptyServiceName

Check warning on line 19 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L18-L19

Added lines #L18 - L19 were not covered by tests
}

// sanitizeEmptyServiceName sanitizes the service names in the resource attributes.
func sanitizeEmptyServiceName(traces ptrace.Traces) ptrace.Traces {
resourceSpans := traces.ResourceSpans()
for i := 0; i < resourceSpans.Len(); i++ {
resourceSpan := resourceSpans.At(i)
attributes := resourceSpan.Resource().Attributes()
serviceNameAttr, ok := attributes.Get("service.name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"service.name"

better to import semantic conventions and use a constant from there

if !ok {

Check warning on line 29 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L23-L29

Added lines #L23 - L29 were not covered by tests
// If service.name is missing, set it to nullProcessServiceName
attributes.PutStr("service.name", missingServiceName)
} else if serviceNameAttr.Str() == "" {

Check warning on line 32 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L31-L32

Added lines #L31 - L32 were not covered by tests
// If service.name is empty, replace it with serviceNameReplacement
attributes.PutStr("service.name", emptyServiceName)

Check warning on line 34 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L34

Added line #L34 was not covered by tests
}
}
return traces

Check warning on line 37 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L37

Added line #L37 was not covered by tests
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"testing"

"github.com/jaegertracing/jaeger/pkg/testutils"
)

func TestMain(m *testing.M) {
testutils.VerifyGoLeaks(m)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"go.opentelemetry.io/collector/pdata/ptrace"
)

// SanitizeTraces is a function that performs enrichment, clean-up, or normalization of trace data.
type SanitizeTraces func(traces ptrace.Traces) ptrace.Traces

// NewStandardSanitizers are automatically applied by SpanProcessor.
func NewStandardSanitizers() []SanitizeTraces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to call this from the exporter

return []SanitizeTraces{
NewEmptyServiceNameSanitizer(),
NewUTF8Sanitizer(),

Check warning on line 17 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L14-L17

Added lines #L14 - L17 were not covered by tests
}
}

// NewChainedSanitizer creates a Sanitizer from the variadic list of passed Sanitizers.
// If the list only has one element, it is returned directly to minimize indirection.
func NewChainedSanitizer(sanitizers ...SanitizeTraces) SanitizeTraces {
if len(sanitizers) == 1 {
return sanitizers[0]

Check warning on line 25 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L23-L25

Added lines #L23 - L25 were not covered by tests
}
return func(traces ptrace.Traces) ptrace.Traces {
for _, s := range sanitizers {
traces = s(traces)

Check warning on line 29 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L27-L29

Added lines #L27 - L29 were not covered by tests
}
return traces

Check warning on line 31 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L31

Added line #L31 was not covered by tests
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"unicode/utf8"

"go.opentelemetry.io/collector/pdata/pcommon"
"go.opentelemetry.io/collector/pdata/ptrace"
)

const (
invalidOperation = "InvalidOperationName"
invalidTagKey = "InvalidTagKey"
badUTF8Prefix = "bad_utf8_"
)

// NewUTF8Sanitizer creates a UTF8 sanitizer.
func NewUTF8Sanitizer() SanitizeTraces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test

return sanitizeUTF8

Check warning on line 21 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L20-L21

Added lines #L20 - L21 were not covered by tests
}

// sanitizeUTF8 sanitizes the UTF8 in the traces.
func sanitizeUTF8(traces ptrace.Traces) ptrace.Traces {
resourceSpans := traces.ResourceSpans()
for i := 0; i < resourceSpans.Len(); i++ {
resourceSpan := resourceSpans.At(i)

Check warning on line 28 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L25-L28

Added lines #L25 - L28 were not covered by tests

// Sanitize resource attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments like this provide no value

sanitizeAttributes(resourceSpan.Resource().Attributes())

Check warning on line 31 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L31

Added line #L31 was not covered by tests

scopeSpans := resourceSpan.ScopeSpans()
for j := 0; j < scopeSpans.Len(); j++ {
scopeSpan := scopeSpans.At(j)

Check warning on line 35 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L33-L35

Added lines #L33 - L35 were not covered by tests

// Sanitize scope attributes
sanitizeAttributes(scopeSpan.Scope().Attributes())

Check warning on line 38 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L38

Added line #L38 was not covered by tests

spans := scopeSpan.Spans()
for k := 0; k < spans.Len(); k++ {
span := spans.At(k)

Check warning on line 42 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L40-L42

Added lines #L40 - L42 were not covered by tests

// Sanitize operation name
if !utf8.ValidString(span.Name()) {
originalName := span.Name()
span.SetName(invalidOperation)
binaryAttr := span.Attributes().PutEmptyBytes(badUTF8Prefix + "operation_name")
binaryAttr.FromRaw([]byte(originalName))

Check warning on line 49 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L45-L49

Added lines #L45 - L49 were not covered by tests
}

// Sanitize span attributes
sanitizeAttributes(span.Attributes())

Check warning on line 53 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L53

Added line #L53 was not covered by tests
}
}
}
return traces

Check warning on line 57 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L57

Added line #L57 was not covered by tests
}

// sanitizeAttributes sanitizes attributes to ensure UTF8 validity.
func sanitizeAttributes(attributes pcommon.Map) {

Check warning on line 61 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L61

Added line #L61 was not covered by tests
// Collect invalid keys and values during iteration
var invalidKeys []string
invalidValues := make(map[string]string)

Check warning on line 64 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L63-L64

Added lines #L63 - L64 were not covered by tests

attributes.Range(func(k string, v pcommon.Value) bool {

Check warning on line 66 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L66

Added line #L66 was not covered by tests
// Handle invalid UTF-8 in attribute keys
if !utf8.ValidString(k) {
invalidKeys = append(invalidKeys, k)

Check warning on line 69 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L68-L69

Added lines #L68 - L69 were not covered by tests
}
// Handle invalid UTF-8 in attribute values
if v.Type() == pcommon.ValueTypeStr && !utf8.ValidString(v.Str()) {
invalidValues[k] = v.Str()

Check warning on line 73 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L72-L73

Added lines #L72 - L73 were not covered by tests
}
return true

Check warning on line 75 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L75

Added line #L75 was not covered by tests
})

// Apply collected changes after iteration
for _, k := range invalidKeys {
originalKey := k
attributes.PutStr(invalidTagKey, k)
binaryAttr := attributes.PutEmptyBytes(badUTF8Prefix + originalKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the originalKey is a bad UTF8 string, you are still creating a new key with the same bad string

binaryAttr.FromRaw([]byte(originalKey))

Check warning on line 83 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L79-L83

Added lines #L79 - L83 were not covered by tests
}

for k, v := range invalidValues {
originalValue := v
attributes.PutStr(k, invalidTagKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this achieve? I suggest you write a specification in English in the comments to the sanitizer what exactly it's trying to do with attributes. And I think starting with the unit tests in this case would be much more useful, because you can see exactly how input and output data are related. For example, I expect the following to be true:

traces1 := inputData // assume it has bad UTF8 strings
traces2 := sanitizeUTF8(traces1) 
assert traces1 != traces2 // definitely changed something
traces3 := sanitizeUTF8(traces2)
assert traces2 == traces3 // the 2nd pass is no-op, since no more invalid data is present

binaryAttr := attributes.PutEmptyBytes(badUTF8Prefix + k)
binaryAttr.FromRaw([]byte(originalValue))

Check warning on line 90 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L86-L90

Added lines #L86 - L90 were not covered by tests
}
}
Loading