-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] QueryGroup level stats structure and api #15120
Comments
Thanks @kaushalmahi12 for initiating this issue. Few considerations:
|
@jainankitk Thanks for reviewing it.
Yes, I am suggesting to have separate API.
What do we mean by
IMO this will be unnecessary and redundant fields in the API, Why do we think that these will be useful w.r.t a QueryGroup ? @backslasht Can you provide your inputs on this ? |
@jainankitk If mean request/task failures then I have the following reasons to not include those in the stats.
|
Thanks @kaushalmahi12 for creating this issue. +1 for creating a new API instead of adding it to node stats. I'm interested in knowing what are the filters you are thinking to provide as part of this new API. Request/response structures will be helpful to understand it.
While I agree that failures may not be due to resource consumption, it does provide clarity for the user to understand which query groups are seeing failures especially if those query groups belongs to multiple tenants. |
Thanks @backslasht ! for going through this and providing your useful insights.
Regarding this We can keep the API aligned with
Here the metrices would be comma separated list of fields that we want to get as part of response
Response
Response
Response
I think failures from a multi-tenancy perspective does make sense to incorporate. But I think we are extending this idea to do the task scheduling using threadpool per query group then in future it might also make sense the currently running tasks and queue size for the tenant. Keeping these things in mind I think it shouldn't hurt to add failures into the stats. |
@kaushalmahi12 - Thanks for providing the request response structures. Few comments:
I also agree with including failures as one of the metric and running tasks/queue size and other relevant metrics once we introduce QueryScheduling using QueryGroups |
Thanks @kaushalmahi12 for the request and response structures. It will be good to add filters using query group id/name as well. |
Thanks @jainankitk for follow up suggestions.
I think filtering based on
IMO this will be redundant information as the structure is simple enough and not too many nested subfields.
I like this idea and we should support it. |
Please describe the end goal of this project
We want to come to a conclusion on all the fields that should be present in query group stats. Sine the
_node/stats
is already quite large and given that we will have [0-100] query groups available in the cluster.Though the number of query groups are limited but these query groups account node level metrices pertaining to a query group, hence actual number of such stat objects in the cluster will be
#dataNodes * #queryGroups
.If the cluster is large e,g;
dataNodes = 200
andqueryGroups = 50
then these objects will be10000
which is a lot and can potentially make the stats output hard to fathom.The schema for the single query group stat I am proposing is something like the following
apart from resource usage in this all the metric values are cumulative counters since the process start time.
Keeping these things in mind I am more inclined towards keeping the stats API for this separate.
I think the feature related stats should only be provided when explicitly asked either using
But currently if a feature is enabled and has stats then they are returned by default. If a client is consuming the
node/stats
and then upgrades to a OS version which has additional stats object present in the response it can break the client code.Supporting References
#12342
Issues
#12342
Related component
Search:Resiliency
The text was updated successfully, but these errors were encountered: