[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

wylieconlon · 2020-06-24T18:12:04Z

The default precision value of the Cardinality aggregation is 3,000 documents: above 3,000, the precision will drop off. The max value is 40,000 in Elasticsearch. Users should be able to tune this parameter in Lens. I propose that we use a numeric text input with validation instead of a slider, but the second-best option would be grouped button at 1000, 3000, 10000, and 40000 thresholds.

Lens should also provide some helper text to indicate that this is not a precise aggregation. I propose that we put this helper text in the editor panel for Unique count, and that the text should be:

Unique count is precise only when the count is lower than the precision threshold. The estimate will be more accurate for higher thresholds, which uses more server resources.

This text is trying to indicate that the queries won't be slower, but that there are other costs associated to running the high-precision queries. This is based on some of the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

cc @cchaos do you agree with the proposal to use a numeric input instead of grouped buttons? A slider would be a bad option here since there aren't many possible options. No design needed.

elasticmachine · 2020-06-24T18:12:06Z

Pinging @elastic/kibana-app (Team:KibanaApp)

cchaos · 2020-06-24T19:14:25Z

To understand your sentence here:

A slider would be a bad option here since there aren't many possible options.

How would you be able to limit the input in a numeric input?

wylieconlon · 2020-06-24T19:25:57Z

By using the isInvalid property and not updating the state when it's invalid? Even if we didn't limit it on our end, Elasticsearch would cap the value at request time.

cchaos · 2020-06-24T19:30:41Z

But how would they know what values are valid if you truly are restricting them? With the EuiRange you could give them very specific allowed increments and values that they can select by using the ticks.

wylieconlon · 2020-06-29T18:56:26Z

@cchaos I like that proposal, I think we could use a slider with with predefined increments at 1000, 3000, 10000, and 40000 thresholds. I don't think we need the extra color indicator, just a slider with predefined ticks.

cchaos · 2020-06-29T20:12:41Z

Sweet! You'll probably also want to shorten the labels to 1k, 3k... etc so they don't bump into each other with all those zeros.

dej611 · 2023-08-02T09:49:49Z

+1 https://discuss.elastic.co/t/kibana-calculations-give-wrong-results/339778/11

markov00 · 2024-04-03T16:10:26Z

+1 #179934

bradquarry · 2024-04-04T12:48:06Z

In my opinion we should not surface an imprecise aggregation type in our core visualization engine to handle what many would expect to be an exact deterministic aggregation result. This impacts business reporting for customers and they seek alternatives.

leandrojmp · 2024-10-04T12:05:05Z

Hello, just a feedback, I have a recent issue because of this.

I have an index where I'm storing data about my Elastic Agents, each document corresponds to one Elastic Agent, one of the fields is the agent.id which is unique per agent.

Using this data I create some dashboards with the Metric visualization, the goal is to have a quick glampse on the deployment of agents on my infrastructure since Elastic does not provide a native dashboard for this.

The issue is that I had 6997 documents in the index and creating a Metric visualization over the agent.id field it was showing up 7010 unique ids, which is more than the number of documents.

After some investigation and quick chat in the slack channel I learned that the Metric visualization is an estimatted only, I do not create queries manually only using the built-in visualizations, I would expect it to not be an exact value for large datasets, things in the range of high hundred thousands and millions, but for it to not be exact for under 10k documents was a surprise.

Changing from unique count to count solved my issue in this case, but now I will need to review every single dashboard that uses a unique count metric because I can not trust in the information anymore.

In my opinion this needs to be made more clearly in Kibana documentation, maybe I'm missing something, but I couldn't find a Kibana documentation mentioning that the Unique Count in Metrics visualization is not exact and just an estimation, it may be present in the Elasticsearch documentation about cardinality queries, but not in the Kibana documentation.

Also, the precision_threshold should be exposed for the user to configure it.

markov00 · 2024-10-04T13:25:17Z

Hi @leandrojmp, Thanks for the feedback. You are perfectly right about the documentation and the editor missing description of the inaccuracy of that cardinality aggregation.
As described in the Elasticsearch docs the cardinality is an approximation due to the way was implemented. The precision_threshold is configured to 3000 by default so that's why you are getting this error also with such small number of documents.

This behavior should be documented and clearly described.

leandrojmp · 2024-10-04T13:54:28Z

@markov00 yeah, the main issue is that this information only exists in the Elasticsearch docs, no mention of it in Kibana.

Another issue is that the precision_threshold is not exposed as a configuration option in the metrics visualization, this would help my specfici case.

nerophon · 2024-10-07T11:37:03Z

A user has raised an issue with me around this last week. They didn't realise that the Unique Count might deliver approximate results. They also want a way to deliver precise results in Lens. I understand Vega can do it, but this is outside of what this user is able to accomplish given their experience & familiarity.
So at a minimum we need to update the documentation and provide a precision slider in Lens. Beyond that, if possible the customer would like an aggregation that can achieve a fully precise cardinality aggregation, that does not crash the Elasticsearch cluster. They are willing to accept that this query may take a long time to complete. So perhaps this implies an alternative aggregation in Elasticsearch, if such a thing is even possible?

ghudgins · 2024-11-18T13:34:38Z

+1

This data has the vulnerabilities for all hosts on the network [...] The unique count needs to be accurate for these reports. I can set up a data transform as a workaround for now.

Erikg346 · 2025-01-23T18:02:56Z

+1
https://discuss.elastic.co/t/visualizations-unique-count-is-inaccurate/373595

wylieconlon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Jun 24, 2020

flash1293 added the enhancement New value added to drive a business result label Aug 6, 2020

ghudgins mentioned this issue Jul 13, 2022

Default value for visualize JSON input #6859

Closed

stratoula added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Jan 30, 2024

timductive added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Apr 3, 2024

markov00 added the vis:data processing label Jun 4, 2024

jb1b84 removed the vis:data processing label Aug 28, 2024

markov00 added the triage_needed label Oct 4, 2024

markov00 removed the triage_needed label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

wylieconlon commented Jun 24, 2020

elasticmachine commented Jun 24, 2020

cchaos commented Jun 24, 2020

wylieconlon commented Jun 24, 2020

cchaos commented Jun 24, 2020

wylieconlon commented Jun 29, 2020

cchaos commented Jun 29, 2020

dej611 commented Aug 2, 2023

markov00 commented Apr 3, 2024

bradquarry commented Apr 4, 2024 •

edited

Loading

leandrojmp commented Oct 4, 2024

markov00 commented Oct 4, 2024

leandrojmp commented Oct 4, 2024

nerophon commented Oct 7, 2024

ghudgins commented Nov 18, 2024

Erikg346 commented Jan 23, 2025 •

edited

Loading

[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

Comments

wylieconlon commented Jun 24, 2020

elasticmachine commented Jun 24, 2020

cchaos commented Jun 24, 2020

wylieconlon commented Jun 24, 2020

cchaos commented Jun 24, 2020

wylieconlon commented Jun 29, 2020

cchaos commented Jun 29, 2020

dej611 commented Aug 2, 2023

markov00 commented Apr 3, 2024

bradquarry commented Apr 4, 2024 • edited Loading

leandrojmp commented Oct 4, 2024

markov00 commented Oct 4, 2024

leandrojmp commented Oct 4, 2024

nerophon commented Oct 7, 2024

ghudgins commented Nov 18, 2024

Erikg346 commented Jan 23, 2025 • edited Loading

bradquarry commented Apr 4, 2024 •

edited

Loading

Erikg346 commented Jan 23, 2025 •

edited

Loading