Skip to content

Commit

Permalink
[metrics-generator] filter out spans based on policy (grafana#2274)
Browse files Browse the repository at this point in the history
* First pass at span filtering

Signed-off-by: Zach Leslie <[email protected]>

* Validate the spanmetrics filteirng config on startup

Signed-off-by: Zach Leslie <[email protected]>

* Give some hope that we return a true match

Signed-off-by: Zach Leslie <[email protected]>

* Drop unused argument service name and rely on attributes

Signed-off-by: Zach Leslie <[email protected]>

* Handling a few intrinsics

Signed-off-by: Zach Leslie <[email protected]>

* Include documentation for spanmetrics filtering policies

Signed-off-by: Zach Leslie <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Update docs/sources/tempo/metrics-generator/span_metrics.md

Co-authored-by: Kim Nylander <[email protected]>

* Adjust filter policy to split policies during New()

Signed-off-by: Zach Leslie <[email protected]>

* Update test for intrinsic

Signed-off-by: Zach Leslie <[email protected]>

* Include benchmark and supporting span generator

Signed-off-by: Zach Leslie <[email protected]>

* Include metric for counting spans that have been filtered out

Signed-off-by: Zach Leslie <[email protected]>

* Include config warning when unsupported intrinic is used

Signed-off-by: Zach Leslie <[email protected]>

* Relocate spanmetrics.FilterPolicy to sharedconfig package and implement overrides

Signed-off-by: Zach Leslie <[email protected]>

* Include sharedconfig pacakge

Signed-off-by: Zach Leslie <[email protected]>

* Update modules/generator/processor/spanmetrics/spanmetrics.go

Co-authored-by: Joe Elliott <[email protected]>

* Refactor spanfilter into its own package

Signed-off-by: Zach Leslie <[email protected]>

* Include tests for spanfilter.New()

Signed-off-by: Zach Leslie <[email protected]>

* Update spanmetrics processor to return an error for spanfilter error

Signed-off-by: Zach Leslie <[email protected]>

* Relocate config validation to spanfilter during New

Signed-off-by: Zach Leslie <[email protected]>

* Update tests for spanmetrics error return

Signed-off-by: Zach Leslie <[email protected]>

* Drop unused

Signed-off-by: Zach Leslie <[email protected]>

* Update docs to include nesting of filtering config

Signed-off-by: Zach Leslie <[email protected]>

* Exit early when attributes are unmatched

Signed-off-by: Zach Leslie <[email protected]>

* Exit early when intrinsics are not matched

Signed-off-by: Zach Leslie <[email protected]>

* Preallocate a couple variables

Signed-off-by: Zach Leslie <[email protected]>

* Add note about use of RandomBatcher

Signed-off-by: Zach Leslie <[email protected]>

* Update changelog

* Drop TODO comment

Signed-off-by: Zach Leslie <[email protected]>

* Add back the lost metric during rebase

Signed-off-by: Zach Leslie <[email protected]>

* Fix policy override configuration

Signed-off-by: Zach Leslie <[email protected]>

* Include generator config test

Signed-off-by: Zach Leslie <[email protected]>

* Migrate the metric and expand reasons

Signed-off-by: Zach Leslie <[email protected]>

* Update tests for discardCounter

Signed-off-by: Zach Leslie <[email protected]>

* Include doc about which kinds are available for filtering

Signed-off-by: Zach Leslie <[email protected]>

* Spellcheck

* Perform number matching for kind and status

Signed-off-by: Zach Leslie <[email protected]>

* Rename discardCounter to filteredSpansCounter

Signed-off-by: Zach Leslie <[email protected]>

* Improve error quality

Signed-off-by: Zach Leslie <[email protected]>

* Update error message in test

Signed-off-by: Zach Leslie <[email protected]>

---------

Signed-off-by: Zach Leslie <[email protected]>
Co-authored-by: Kim Nylander <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
  • Loading branch information
3 people authored May 2, 2023
1 parent 6e7cd10 commit 14848fd
Show file tree
Hide file tree
Showing 16 changed files with 2,314 additions and 28 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* [ENHANCEMENT] Add synchronous read mode to vParquet and vParquet2 optionally enabled by env vars [#2165](https://github.com/grafana/tempo/pull/2165) (@mdisibio)
* [ENHANCEMENT] Add option to override metrics-generator ring port [#2399](https://github.com/grafana/tempo/pull/2399) (@mdisibio)
* [ENHANCEMENT] Add support for IPv6 [#1555](https://github.com/grafana/tempo/pull/1555) (@zalegrala)
* [ENHANCEMENT] Add span filtering to spanmetrics processor [#2274](https://github.com/grafana/tempo/pull/2274) (@zalegrala)
* [BUGFIX] tempodb integer divide by zero error [#2167](https://github.com/grafana/tempo/issues/2167) (@kroksys)
* [CHANGE] **Breaking Change** Rename s3.insecure_skip_verify [#???](https://github.com/grafana/tempo/pull/???) (@zalegrala)
```yaml
Expand Down
93 changes: 86 additions & 7 deletions docs/sources/tempo/metrics-generator/span_metrics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
aliases:
- /docs/tempo/latest/server_side_metrics/span_metrics/
- /docs/tempo/latest/metrics-generator/span_metrics/
- /docs/tempo/latest/server_side_metrics/span_metrics/
- /docs/tempo/latest/metrics-generator/span_metrics/
title: Generate metrics from spans
weight: 400
---
Expand All @@ -11,8 +11,9 @@ weight: 400
The span metrics processor generates metrics from ingested tracing data, including request, error, and duration (RED) metrics.

Span metrics generate two metrics:
* A counter that computes requests
* A histogram that tracks the distribution of durations of all requests

- A counter that computes requests
- A histogram that tracks the distribution of durations of all requests

Span metrics are of particular interest if your system is not monitored with metrics,
but it has distributed tracing implemented.
Expand Down Expand Up @@ -43,7 +44,7 @@ This processor is designed with the goal to mirror the implementation from the O
The following metrics are exported:

| Metric | Type | Labels | Description |
|--------------------------------|-----------|------------|------------------------------|
| ------------------------------ | --------- | ---------- | ---------------------------- |
| traces_spanmetrics_latency | Histogram | Dimensions | Duration of the span |
| traces_spanmetrics_calls_total | Counter | Dimensions | Total count of the span |
| traces_spanmetrics_size_total | Counter | Dimensions | Total size of spans ingested |
Expand All @@ -56,7 +57,6 @@ When a configured dimension collides with one of the default labels (e.g. `statu

If you use ratio based sampler you can use custom sampler below to not lose metric information, you also need to set `metrics_generator.processor.span_metrics.span_multiplier_key` to `"X-SampleRatio"`


```go
package tracer
import (
Expand Down Expand Up @@ -91,6 +91,85 @@ func (ds RatioBasedSampler) Description() string {
}
```

### Filtering

In some cases, you may want to reduce the number of metrics produced by the `spanmetrics` processor. You can configure the processor to use an `include` filter to match criteria that must be present in the span in order to be included. Following the include filter, an `exclude` filter may be used to reject portions of what was previously included by the filter policy.

Currently, only filtering by resource and span attributes with the following value types is supported.

- `bool`
- `double`
- `int`
- `string`

Additionally, these intrinsic span attributes may be filtered upon:

- `name`
- `status` (code)
- `kind`

The following intrinsic kinds are available for filtering.

- `SPAN_KIND_SERVER`
- `SPAN_KIND_INTERNAL`
- `SPAN_KIND_CLIENT`
- `SPAN_KIND_PRODUCER`
- `SPAN_KIND_CONSUMER`

Intrinsic keys can be acted on directly when implementing a filter policy. For example:

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: kind
value: SPAN_KIND_SERVER
```
In this example, spans which are of `kind` "server" are included for metrics export.

When selecting spans based on non-intrinsic attributes, it is required to specify the scope of the attribute, similar to how it is specified in TraceQL. For example, if the `resource` contains a `location` attribute which is to be used in a filter policy, then the reference needs to be specified as `resource.location`. This requires users to know and specify which scope an attribute is to be found and avoids the ambiguity of conflicting values at differing scopes. The following may help illustrate.

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: strict
attributes:
- key: resource.location
value: earth
```

In the above examples, we are using `match_type` of `strict`, which is a direct comparison of values. An additional option for `match_type` is `regex`. This allows users to build a regular expression to match against.

```yaml
---
metrics_generator:
processor:
span_metrics:
filter_policies:
- include:
match_type: regex
attributes:
- key: resource.location
value: eu-.*
- exclude:
match_type: regex
attributes:
- key: resource.tier
value: dev-.*
```

In the above, we first include all spans which have a `resource.location` that begins with `eu-` with the `include` statement, and then exclude those with begin with `dev-`. In this way, a flexible approach to filtering can be achieved to ensure that only metrics which are important are generated.

## Example

<p align="center"><img src="../span-metrics-example.png" alt="Span metrics overview"></p>
<p align="center"><img src="../span-metrics-example.png" alt="Span metrics overview"></p>
3 changes: 3 additions & 0 deletions modules/generator/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userI
return ProcessorConfig{}, errors.Wrap(err, "fail to apply overrides")
}
}
if filterPolicies := o.MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID); filterPolicies != nil {
copyCfg.SpanMetrics.FilterPolicies = filterPolicies
}

return copyCfg, nil
}
62 changes: 62 additions & 0 deletions modules/generator/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (

"github.com/grafana/tempo/modules/generator/processor/servicegraphs"
"github.com/grafana/tempo/modules/generator/processor/spanmetrics"
"github.com/grafana/tempo/pkg/spanfilter/config"
)

func TestProcessorConfig_copyWithOverrides(t *testing.T) {
Expand Down Expand Up @@ -69,4 +70,65 @@ func TestProcessorConfig_copyWithOverrides(t *testing.T) {
_, err := original.copyWithOverrides(o, "tenant")
require.Error(t, err)
})

t.Run("nil policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: nil,
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.Equal(t, *original, copied)
})

t.Run("empty policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: []config.FilterPolicy{},
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

assert.Equal(t, []config.FilterPolicy{}, copied.SpanMetrics.FilterPolicies)
})

t.Run("policy overrides", func(t *testing.T) {
o := &mockOverrides{
spanMetricsFilterPolicies: []config.FilterPolicy{
{
Include: &config.PolicyMatch{
MatchType: config.Strict,
Attributes: []config.MatchPolicyAttribute{
{
Key: "key",
Value: "value",
},
},
},
},
},
}

copied, err := original.copyWithOverrides(o, "tenant")
require.NoError(t, err)

assert.NotEqual(t, *original, copied)

assert.Equal(t, []config.FilterPolicy{
{
Include: &config.PolicyMatch{
MatchType: config.Strict,
Attributes: []config.MatchPolicyAttribute{
{
Key: "key",
Value: "value",
},
},
},
},
}, copied.SpanMetrics.FilterPolicies)
})
}
12 changes: 10 additions & 2 deletions modules/generator/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ var (
}, []string{"tenant", "reason"})
)

const reasonOutsideTimeRangeSlack = "outside_metrics_ingestion_slack"
const (
reasonOutsideTimeRangeSlack = "outside_metrics_ingestion_slack"
reasonSpanMetricsFiltered = "span_metrics_filtered"
)

type instance struct {
cfg *Config
Expand Down Expand Up @@ -256,9 +259,14 @@ func (i *instance) addProcessor(processorName string, cfg ProcessorConfig) error
level.Debug(i.logger).Log("msg", "adding processor", "processorName", processorName)

var newProcessor processor.Processor
var err error
switch processorName {
case spanmetrics.Name:
newProcessor = spanmetrics.New(cfg.SpanMetrics, i.registry)
filteredSpansCounter := metricSpansDiscarded.WithLabelValues(i.instanceID, reasonSpanMetricsFiltered)
newProcessor, err = spanmetrics.New(cfg.SpanMetrics, i.registry, filteredSpansCounter)
if err != nil {
return err
}
case servicegraphs.Name:
newProcessor = servicegraphs.New(cfg.ServiceGraphs, i.instanceID, i.registry, i.logger)
default:
Expand Down
2 changes: 2 additions & 0 deletions modules/generator/overrides.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package generator
import (
"github.com/grafana/tempo/modules/generator/registry"
"github.com/grafana/tempo/modules/overrides"
filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
)

type metricsGeneratorOverrides interface {
Expand All @@ -14,6 +15,7 @@ type metricsGeneratorOverrides interface {
MetricsGeneratorProcessorSpanMetricsHistogramBuckets(userID string) []float64
MetricsGeneratorProcessorSpanMetricsDimensions(userID string) []string
MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID string) map[string]bool
MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID string) []filterconfig.FilterPolicy
}

var _ metricsGeneratorOverrides = (*overrides.Overrides)(nil)
11 changes: 10 additions & 1 deletion modules/generator/overrides_test.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
package generator

import "time"
import (
"time"

filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
)

type mockOverrides struct {
processors map[string]struct{}
Expand All @@ -9,6 +13,7 @@ type mockOverrides struct {
spanMetricsHistogramBuckets []float64
spanMetricsDimensions []string
spanMetricsIntrinsicDimensions map[string]bool
spanMetricsFilterPolicies []filterconfig.FilterPolicy
}

var _ metricsGeneratorOverrides = (*mockOverrides)(nil)
Expand Down Expand Up @@ -48,3 +53,7 @@ func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsDimensions(userID st
func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsIntrinsicDimensions(userID string) map[string]bool {
return m.spanMetricsIntrinsicDimensions
}

func (m *mockOverrides) MetricsGeneratorProcessorSpanMetricsFilterPolicies(userID string) []filterconfig.FilterPolicy {
return m.spanMetricsFilterPolicies
}
4 changes: 4 additions & 0 deletions modules/generator/processor/spanmetrics/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package spanmetrics
import (
"flag"

filterconfig "github.com/grafana/tempo/pkg/spanfilter/config"
"github.com/pkg/errors"
"github.com/prometheus/client_golang/prometheus"
)
Expand Down Expand Up @@ -34,6 +35,9 @@ type Config struct {
// Subprocessor options for this Processor include Latency, Count, Size
// These are metrics categories that exist under the umbrella of Span Metrics
Subprocessors map[Subprocessor]bool

// FilterPolicies is a list of policies that will be applied to spans for inclusion or exlusion.
FilterPolicies []filterconfig.FilterPolicy `yaml:"filter_policies"`
}

func (cfg *Config) RegisterFlagsAndApplyDefaults(prefix string, f *flag.FlagSet) {
Expand Down
23 changes: 20 additions & 3 deletions modules/generator/processor/spanmetrics/spanmetrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ import (
gen "github.com/grafana/tempo/modules/generator/processor"
processor_util "github.com/grafana/tempo/modules/generator/processor/util"
"github.com/grafana/tempo/modules/generator/registry"
"github.com/grafana/tempo/pkg/spanfilter"
"github.com/grafana/tempo/pkg/tempopb"
v1 "github.com/grafana/tempo/pkg/tempopb/resource/v1"
v1_trace "github.com/grafana/tempo/pkg/tempopb/trace/v1"
tempo_util "github.com/grafana/tempo/pkg/util"
"github.com/prometheus/client_golang/prometheus"
)

const (
Expand All @@ -31,11 +33,14 @@ type Processor struct {
spanMetricsDurationSeconds registry.Histogram
spanMetricsSizeTotal registry.Counter

filter *spanfilter.SpanFilter
filteredSpansCounter prometheus.Counter

// for testing
now func() time.Time
}

func New(cfg Config, registry registry.Registry) gen.Processor {
func New(cfg Config, registry registry.Registry, spanDiscardCounter prometheus.Counter) (gen.Processor, error) {
labels := make([]string, 0, 4+len(cfg.Dimensions))

if cfg.IntrinsicDimensions.Service {
Expand Down Expand Up @@ -68,10 +73,18 @@ func New(cfg Config, registry registry.Registry) gen.Processor {
if cfg.Subprocessors[Size] {
p.spanMetricsSizeTotal = registry.NewCounter(metricSizeTotal, labels)
}

filter, err := spanfilter.NewSpanFilter(cfg.FilterPolicies)
if err != nil {
return nil, err
}

p.Cfg = cfg
p.registry = registry
p.now = time.Now
return p
p.filteredSpansCounter = spanDiscardCounter
p.filter = filter
return p, nil
}

func (p *Processor) Name() string {
Expand All @@ -95,7 +108,11 @@ func (p *Processor) aggregateMetrics(resourceSpans []*v1_trace.ResourceSpans) {

for _, ils := range rs.ScopeSpans {
for _, span := range ils.Spans {
p.aggregateMetricsForSpan(svcName, rs.Resource, span)
if p.filter.ApplyFilterPolicy(rs.Resource, span) {
p.aggregateMetricsForSpan(svcName, rs.Resource, span)
continue
}
p.filteredSpansCounter.Inc()
}
}
}
Expand Down
Loading

0 comments on commit 14848fd

Please sign in to comment.