-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ottl: Efficiently test if a value is in a list or map #30420
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@ringerc I agree that OTTL needs to support this functionality. My initial thought is a function like |
@TylerHelmuth That sounds sensible, straightforward to implement and would be fairly discoverable. I haven't been able to work out if OTTL has a map literal syntax, but array/list literals would do the job. For maps it's possible to test the looked up key for nil, so array is more important anyway. It would be interesting to be able to pre-parse, sort and cache long array literals for this purpose, but I don't know if that's even remotely feasible or worthwhile with how the OTTL runtime works. |
Literals in OTTL are created during collector startup and then reused. If you end up doing something like
I agree, maybe this function should only allow |
So an optional "presorted" bool flag could be provided, to attest that the input is a sorted literal slice. If false (default) the list will be scanned sequentially or will be sorted before execution. There's no map literal, right? A pre-parsed, pre-hashed map would probably by quicker to use for an exists/contains test, with the minor inconvenience of having to provide dummy values for each key. |
We don't have a map literal. Can you describe more about your use case and how a map solves it over a slice? Is it a lot of values? |
Sure, and thanks for your time looking at this. TL;DR: I need to use an allow-list / keep-list for metric names. A bit like So far it's no more than 30 values per set of metrics matched by common prefix. Maybe 200 metrics in total but likely to grow. Currently I'm trying to use the transform processor to e.g.
... then a Having a
The reason it's required is hosted provider billing. One of the metrics destinations I'm using is Datadog, which has an "interesting" billing and usage model. They bill metrics very steeply by the total cardinality across a 1 hour window, not data volume or unique datapoint count. But a bunch of metrics are special cased as claimed by specific "datadog integrations"; these are not billed at all irrespective of cardinality. Those metrics are matched by metric name. Billing is also done by unique monitored host, so these aren't free, they just don't get double-charged. Datadog's list of metrics in their OTLP integration is incomplete and stale. They use custom names specific to their own agent for most other metrics, but some of them overlap with OTLP hostmetrics and kubeletstats names. To prevent unpleasant billing surprises I need to "allow-list" only a specific set of named metrics to be delivered. These allow-lists are extracted from DD's metadata files, like https://github.com/DataDog/integrations-core/blob/master/otel/metadata.csv and https://github.com/DataDog/integrations-core/blob/master/container/metadata.csv (On a side note I've been meaning to have a look at the metric transform processor to see if it can be patched to support a drop operation for metric datapoints, avoiding the need for the separate filterprocessor) |
I think a processors:
filter/ottl:
error_mode: ignore
metrics:
metric:
# each statement is ORed together - if any statement returns true the metric is dropped.
- name = "otelcol_exporter_queue_capacity"
- name = "otelcol_exporter_queue_size"
- name = "otelcol_exporter_sent_metric_points"
- name = "otelcol_process_cpu_seconds"
- name = "otelcol_process_memory_rss"
- name = "otelcol_processor_batch_batch_send_size"
- name = "otelcol_processor_batch_batch_size_trigger_send"
- name = "otelcol_processor_batch_timeout_trigger_send"
- name = "otelcol_processor_dropped_metric_points"
- name = "otelcol_processor_refused_metric_points"
- name = "otelcol_process_runtime_heap_alloc_bytes"
- name = "otelcol_process_runtime_total_alloc_bytes"
- name = "otelcol_process_runtime_total_sys_memory_bytes"
- name = "otelcol_process_uptime"
- name = "otelcol_receiver_accepted_metric_points"
- name = "otelcol_receiver_refused_metric_points"
- name = "otelcol_scraper_errored_metric_points"
- name = "otelcol_scraper_scraped_metric_points" Or with processors:
filter/ottl:
error_mode: ignore
metrics:
metric:
- IsMatch(name, "^otelcol_") These options would definitely be more efficient that calling a |
@TylerHelmuth The filterprocessor won't help unless I've grossly misunderstood something, because it can only express "drop if ..." conditions. I'm trying to express an allow-list not a drop-list - a "retain this explicit subset of container.* metrics but drop all other container.* metrics" rule. I don't think the filterprocessor has a sensible way to express that? It doesn't AFAICS have a way to say "mark this set of datapoints to be kept, then drop all matching this expression unless previously marked to be kept". The issue is writing "keep a_b and a_c but drop all other a_*" - where the list of keeps may be long. so I could do a filterprocessor like
but it's not a lot better really. For this specific use it's probably better than setting a Good to know that a long list of individual statements may be more efficient than testing for list-containment though. I'd assumed that testing for membership, at least in a pre-sorted list, should be significantly faster but it was just that - an assumption. I've not had a chance to do much performance testing. |
@TylerHelmuth This may be a good use-case for a filter processor configuration option to keep if there's a match instead of dropping on a match (essentially wrapping all statements in a
@ringerc OTTL currently doesn't optimize string lists for fast lookup, I don't believe a pre-sorted list will help in most circumstances since an O(n) search would still be required. Usually the datasets OTTL is concerned with aren't large enough to justify any optimizations, but this may change in the future. |
@evan-bradley Right. Checking in a list would only make sense if it could be marked as pre-sorted in the argument list so a faster search could be used. But then Bad Confusing Things would happen if the user's sort order for the locale they used for sorting didn't match the comparator that ottl uses, so it's probably not worth supporting anyway. Re filter processor what I've been wondering about is the feasibility of marking a metric/datapoint to be dropped instead of eagerly dropping it, then having a way to unmark subsets. So it'd be possible to say "mark all x to be dropped" then "clear the to-drop flag on this subset". I emulate this at the moment by using a transform processor to set and clear a "to-drop" attribute, then a filterprocessor to actually apply the final decisions. This also means I can disable the filter processor to see what would be done by sending the dataset to a debug exporter, file exporter, etc. When dropping directly in the filter processor it's difficult to debug rules because there's no sort of "no-op" mode that (e.g.) sets an attribute instead of actually dropping. The filter processor actually overlaps a lot with the transform processor, so it might make sense to give the transform processor a "drop" operation instead. IDK. Anyway, this has gone off the rails of the original suggestion a bit. It doesn't sound like it's simple to do. Shall I just close it? |
One more thought on the tangent, you can currently do: processors:
filter/ottl:
error_mode: ignore
metrics:
metric:
- |
not (
name = "otelcol_exporter_queue_capacity" or
name = "otelcol_exporter_queue_size" orname = "otelcol_exporter_sent_metric_points" or
... or
name = "otelcol_scraper_scraped_metric_points"
) @evan-bradley I agree that an option to set a top-level @ringerc we got sidetracked, but the original goal is still needed: we should have a |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I can add another use-case into the bunch. We have some high-dimensionality infra-level attributes that we do not want to ship by default (cost optimization), but in few rare cases we want to keep the ability to do so. The problem is, the "few rare cases" are difficult to statically define, sometimes they are a function of things like environment, execution context, specific call place, etc. Therefore, instead of trying to encapsulate that in otel-config, we built this special attribute that the client can send us - "_infra_labels". Client can specify extra labels they want included for this specific data-point in this attribute in a form of an array, for example Then, on the otel-collector side we check each of the labels for presence in that list, like In absence of "contains" operator, we are currently forced to do |
Is this feature planned to be implemented soon? The issue #32691 would be resolved by implementing a Contains operator |
I don't have time right now to implement it myself, but open to PRs |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Component(s)
pkg/ottl
Describe the issue you're reporting
Currently OTTL doesn't offer an efficient way to test if a value is in a list of possible values. It has no
in
orcontains
operator, nor (AFAICS) does it have map literals.Sometimes it's necessary to have filters that use fairly long lists of values to match, e.g. metric names to allow-list. These should be efficient to match, preferably without re-parsing and sorting the whole list every time.
Possible implementations could be:
{"metric1": true, "metric2": true, "metric3": true}[name]
. A means of coalescing null to false would be required, because it looks like OTTL gets upset when passed a null when it expects a boolean.in
orcontains
operator on arrays:name in ["metric1", "metric2"]
Current workarounds
... where name == "metric1"
,... where name == "metric2"
etc statements or a compound conditional withor
where (name == "metric1" or name == "metric2")
chained conditionsIsMatch
with a regex likeIsMatch(name,"^(metric1|metric2)$")
The text was updated successfully, but these errors were encountered: