Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Unique count aggregation should have control for precision threshold and warning about estimates #69832

Open
wylieconlon opened this issue Jun 24, 2020 · 15 comments
Labels
enhancement New value added to drive a business result Feature:Lens impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@wylieconlon
Copy link
Contributor

The default precision value of the Cardinality aggregation is 3,000 documents: above 3,000, the precision will drop off. The max value is 40,000 in Elasticsearch. Users should be able to tune this parameter in Lens. I propose that we use a numeric text input with validation instead of a slider, but the second-best option would be grouped button at 1000, 3000, 10000, and 40000 thresholds.

Lens should also provide some helper text to indicate that this is not a precise aggregation. I propose that we put this helper text in the editor panel for Unique count, and that the text should be:

Unique count is precise only when the count is lower than the precision threshold. The estimate will be more accurate for higher thresholds, which uses more server resources.

This text is trying to indicate that the queries won't be slower, but that there are other costs associated to running the high-precision queries. This is based on some of the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

cc @cchaos do you agree with the proposal to use a numeric input instead of grouped buttons? A slider would be a bad option here since there aren't many possible options. No design needed.

@wylieconlon wylieconlon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Jun 24, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@cchaos
Copy link
Contributor

cchaos commented Jun 24, 2020

To understand your sentence here:

A slider would be a bad option here since there aren't many possible options.

How would you be able to limit the input in a numeric input?

@wylieconlon
Copy link
Contributor Author

By using the isInvalid property and not updating the state when it's invalid? Even if we didn't limit it on our end, Elasticsearch would cap the value at request time.

@cchaos
Copy link
Contributor

cchaos commented Jun 24, 2020

But how would they know what values are valid if you truly are restricting them? With the EuiRange you could give them very specific allowed increments and values that they can select by using the ticks.
Screen Shot 2020-06-24 at 15 30 12 PM

@wylieconlon
Copy link
Contributor Author

@cchaos I like that proposal, I think we could use a slider with with predefined increments at 1000, 3000, 10000, and 40000 thresholds. I don't think we need the extra color indicator, just a slider with predefined ticks.

@cchaos
Copy link
Contributor

cchaos commented Jun 29, 2020

Sweet! You'll probably also want to shorten the labels to 1k, 3k... etc so they don't bump into each other with all those zeros.

@flash1293 flash1293 added the enhancement New value added to drive a business result label Aug 6, 2020
@dej611
Copy link
Contributor

dej611 commented Aug 2, 2023

@stratoula stratoula added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Jan 30, 2024
@timductive timductive added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Apr 3, 2024
@markov00
Copy link
Member

markov00 commented Apr 3, 2024

+1 #179934

@bradquarry
Copy link

bradquarry commented Apr 4, 2024

In my opinion we should not surface an imprecise aggregation type in our core visualization engine to handle what many would expect to be an exact deterministic aggregation result. This impacts business reporting for customers and they seek alternatives.

@leandrojmp
Copy link

Hello, just a feedback, I have a recent issue because of this.

I have an index where I'm storing data about my Elastic Agents, each document corresponds to one Elastic Agent, one of the fields is the agent.id which is unique per agent.

Using this data I create some dashboards with the Metric visualization, the goal is to have a quick glampse on the deployment of agents on my infrastructure since Elastic does not provide a native dashboard for this.

The issue is that I had 6997 documents in the index and creating a Metric visualization over the agent.id field it was showing up 7010 unique ids, which is more than the number of documents.

After some investigation and quick chat in the slack channel I learned that the Metric visualization is an estimatted only, I do not create queries manually only using the built-in visualizations, I would expect it to not be an exact value for large datasets, things in the range of high hundred thousands and millions, but for it to not be exact for under 10k documents was a surprise.

Changing from unique count to count solved my issue in this case, but now I will need to review every single dashboard that uses a unique count metric because I can not trust in the information anymore.

In my opinion this needs to be made more clearly in Kibana documentation, maybe I'm missing something, but I couldn't find a Kibana documentation mentioning that the Unique Count in Metrics visualization is not exact and just an estimation, it may be present in the Elasticsearch documentation about cardinality queries, but not in the Kibana documentation.

Also, the precision_threshold should be exposed for the user to configure it.

@markov00
Copy link
Member

markov00 commented Oct 4, 2024

Hi @leandrojmp, Thanks for the feedback. You are perfectly right about the documentation and the editor missing description of the inaccuracy of that cardinality aggregation.
As described in the Elasticsearch docs the cardinality is an approximation due to the way was implemented. The precision_threshold is configured to 3000 by default so that's why you are getting this error also with such small number of documents.

This behavior should be documented and clearly described.

@leandrojmp
Copy link

@markov00 yeah, the main issue is that this information only exists in the Elasticsearch docs, no mention of it in Kibana.

Another issue is that the precision_threshold is not exposed as a configuration option in the metrics visualization, this would help my specfici case.

@nerophon
Copy link
Contributor

nerophon commented Oct 7, 2024

A user has raised an issue with me around this last week. They didn't realise that the Unique Count might deliver approximate results. They also want a way to deliver precise results in Lens. I understand Vega can do it, but this is outside of what this user is able to accomplish given their experience & familiarity.
So at a minimum we need to update the documentation and provide a precision slider in Lens. Beyond that, if possible the customer would like an aggregation that can achieve a fully precise cardinality aggregation, that does not crash the Elasticsearch cluster. They are willing to accept that this query may take a long time to complete. So perhaps this implies an alternative aggregation in Elasticsearch, if such a thing is even possible?

@ghudgins
Copy link
Contributor

+1

This data has the vulnerabilities for all hosts on the network [...] The unique count needs to be accurate for these reports. I can set up a data transform as a workaround for now.

@Erikg346
Copy link

Erikg346 commented Jan 23, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests