Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Clause Level Stats in OpenSearch #5983

Closed
navneet1v opened this issue Jan 23, 2023 · 19 comments · Fixed by #12601
Closed

Query Clause Level Stats in OpenSearch #5983

navneet1v opened this issue Jan 23, 2023 · 19 comments · Fixed by #12601
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers help wanted Extra attention is needed Plugins Search Search query, autocomplete ...etc

Comments

@navneet1v
Copy link
Contributor

navneet1v commented Jan 23, 2023

Is your feature request related to a problem? Please describe.
With the release of Neural Search plugin we release added a new query clause called as "neural". The query clause takes up a query text and model_id and re-writes the query to a k-NN based query. We want to add the stats for this new query clause in OpenSearch which will provide information like: how many times this query clause was invoked, latency, errors etc.

This is a problem for all the query clauses and not the neural query clause. Hence the solution should be OpenSearch specific and not plugin specific.

Describe the solution you'd like
Given that currently we don't have any mechanism to track the metrics described above for all the different query clauses, solution is to implement a stats api that can provide this information.

Describe alternatives you've considered
The only alternative that I can think of is plugins create a new profile/stats api that can provide stats for the query clauses. But this will make every plugin to reinvent the wheel and won't be consistent across different query clauses.

The Stats api present in OpenSearch provide results for the search api combined but it doesn't give metrics around specific query clauses.

GET _stats?pretty&filter_path=_all.primaries.search

Response

{
  "_all" : {
    "primaries" : {
      "search" : {
        "open_contexts" : 0,
        "query_total" : 1779,
        "query_time_in_millis" : 407,
        "query_current" : 0,
        "fetch_total" : 1779,
        "fetch_time_in_millis" : 763,
        "fetch_current" : 0,
        "scroll_total" : 1,
        "scroll_time_in_millis" : 58,
        "scroll_current" : 0,
        "point_in_time_total" : 0,
        "point_in_time_time_in_millis" : 0,
        "point_in_time_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      }
    }
  }
}

Additional context

  1. Enabling this metrics will allow customer to understand which are the most used query clauses and will also help them to understand the query patterns. This will also provide a consistent experience for OpenSearch Customers for understanding the query.
  2. For plugin owners who are adding new query clauses it will reduce the work for adding new metrics/rest handlers for the stats.
@navneet1v navneet1v added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 23, 2023
@mch2 mch2 added Plugins Search Search query, autocomplete ...etc labels Jan 24, 2023
@navneet1v
Copy link
Contributor Author

@nknize @dblock can we build this feature in OpenSearch core. This can avoid a lot of duplicate work which plugins has to do.

@dblock
Copy link
Member

dblock commented Jan 27, 2023

At a high level this seems like a good idea to me, provided it's very generic.

@navneet1v
Copy link
Contributor Author

At a high level this seems like a good idea to me, provided it's very generic.

@dblock Can we core team prioritize this in their roadmap?

@dblock
Copy link
Member

dblock commented Jan 30, 2023

@navneet1v I doubt it AFAIK they are focused on storage and extensibility, but feel free to contribute!

@dagneyb
Copy link

dagneyb commented Jan 31, 2023

@dblock @navneet1v I'd like to keep the Core focus on storage and extensibility unless we have volumes of community members asking for this, or as @dblock feel free to contribute!

@vamshin vamshin added the help wanted Extra attention is needed label Oct 9, 2023
@vamshin vamshin added the good first issue Good for newcomers label Oct 9, 2023
@dagneyb
Copy link

dagneyb commented Oct 10, 2023

@vamshin can you clarify that you are thinking this would not send metrics to anywhere for customers of open source opensearch, but would allow customers (managed service providers, customers, etc.) to send their own cluster data to their self-designated endpoint?

@msfroh
Copy link
Collaborator

msfroh commented Nov 6, 2023

This can probably be covered with an addition to #10724

One thing to note is that we're trying to move away from stats APIs for everything (at least some of us are pushing in that direction). Instead, we're trying to embrace more of a tracing/telemetry-based approach, which should give us much better numbers, rather than silly point-in-time "how is the cluster right now?" data.

@vibrantvarun
Copy link
Member

vibrantvarun commented Jan 9, 2024

#10724
@deshsidd Is there a way I can use this work to get plugins query clause count?

My use case is
I need to cater neural search query clause count. For that we have neural query builder in neural search plugin. Now the problem is I cannot access the neuralquerybuilder in opensearch and therefore add a counter in SearchQueryCounter and SearchQueryCategorizingVisitor.

cc: @msfroh

@msfroh
Copy link
Collaborator

msfroh commented Feb 2, 2024

#10724 @deshsidd Is there a way I can use this work to get plugins query clause count?

My use case is I need to cater neural search query clause count. For that we have neural query builder in neural search plugin. Now the problem is I cannot access the neuralquerybuilder in opensearch and therefore add a counter in SearchQueryCounter and SearchQueryCategorizingVisitor.

cc: @msfroh

The best bet would probably be to emit the metrics based on the QueryBuilder::getName() method. Unfortunately, right now, we have a fixed list of counters.

Can we make the counters dynamic instead? Like instead of having a bunch of counter fields, we could have a ConcurrentHashMap<String, Counter>, where we allocate the counters based on the query types that we see? @deshsidd, what do you think?

@deshsidd
Copy link
Contributor

deshsidd commented Feb 2, 2024

@msfroh Yes that would make sense and will cover all the query types. Let me look into adding something like this in 2.13. Will add a task for it. Thanks @vibrantvarun

@vamshin
Copy link
Member

vamshin commented Feb 21, 2024

@deshsidd is this still planned for 2.13?

@navneet1v
Copy link
Contributor Author

@msfroh Yes that would make sense and will cover all the query types. Let me look into adding something like this in 2.13. Will add a task for it. Thanks @vibrantvarun

Hi,
Can we make sure that plugins are able to use this feature to provide metrics for their query clauses. If the metrics cannot be provided by plugins then the feature won't be complete.

cc: @msfroh

@deshsidd
Copy link
Contributor

deshsidd commented Mar 5, 2024

This is not prioritized for 2.13 but is in the roadmap. As mentioned earlier we would have to make the counters dynamic where we allocate the counters based on the query types we see.

@navneet1v
Copy link
Contributor Author

This is not prioritized for 2.13 but is in the roadmap. As mentioned earlier we would have to make the counters dynamic where we allocate the counters based on the query types we see.

This will make the feature half baked. :(

@deshsidd
Copy link
Contributor

deshsidd commented Mar 5, 2024

@navneet1v If you or @vibrantvarun have the bandwidth it should not be a large change and you can contribute! :) Unfortunately, we are focused on items related to our performance roadmap for now : https://github.com/orgs/opensearch-project/projects/153

@navneet1v
Copy link
Contributor Author

navneet1v commented Mar 13, 2024

@msfroh can you point me to the interface that needs to be extended by plugins to use this feature?
@deshsidd

@navneet1v navneet1v reopened this Mar 13, 2024
@github-project-automation github-project-automation bot moved this from ✅ Done to 🏗 In progress in Search Project Board Mar 13, 2024
@vibrantvarun
Copy link
Member

@navneet1v I have already tried to extend it in plugins but the infrastructure is created like that it does not have feature to extend in plugin. Therefore, there is a need to do work at infrastructure level.

cc: @navneet1v @deshsidd

@msfroh
Copy link
Collaborator

msfroh commented Mar 14, 2024

In theory, you shouldn't need to extend anything in a plugin. The change in #12601 just emits counters based on the query type.

@deshsidd
Copy link
Contributor

@navneet1v @vibrantvarun As mentioned in the PR, the system now supports dynamic query types, eliminating the need to track specific query types exclusively. Instead, counters for all query types encountered including custom query types introduced by plugins will be created and incremented as needed. This ensures that accounting for any custom query types issued by plugins is also handled seamlessly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers help wanted Extra attention is needed Plugins Search Search query, autocomplete ...etc
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

9 participants