Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Materialize external datasource queries results #2597

Open
YANG-DB opened this issue Mar 26, 2024 · 3 comments
Open

[FEATURE]Materialize external datasource queries results #2597

YANG-DB opened this issue Mar 26, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Mar 26, 2024

Is your feature request related to a problem?

Today OpenSearch Dashboard can query external datasource (Prometheus ) and show the results in a dedicated UX .

The results are not stored locally but only in the short session of the browser. While this capability is useful for federating multiple (remote) datasources, it has some problems :

The problem with this (local browser results storage) approach is as follows :

  • data is only shown in the browser and doesn’t persist
  • only observability UX can view the results - no standards dashboard can use these. queries
  • data cant be mixed / overlayed / joined from any other source
  • data can't be persisted so that it can be compare with older results
  • data can't be watermarked and only delta diffs will be fetch while running the query again later
  • consolidate existing dashboard metrics functionality across all use cases

This capability is very significant for enabling metrics + traces correlation when metrics are external to opensearch

What solution would you like?
In a similar way that the Flint is able to query S3 datasource and store the results locally in opensearch index - the same approach should be taken here.

  1. An external datasource is defined
  2. Materialized indexes are created for this datasource queries metadata and results
  3. PPL query is submitted from the browser (dashboard)
  4. PPL query metadata is saved in the datasource's metadata index
  5. Federated Query Coordinator translate PPL to the remote datasource query and submit query
  6. PPL query results are returned and translated to the expected formant and saved in the datasource's results index
  7. Future PPL queries will be able to use the result index for the purpose of query acceleration and local cache

High level architecture illustration
Screenshot 2024-03-26 at 11 38 58 AM

@YANG-DB YANG-DB added enhancement New feature or request untriaged labels Mar 26, 2024
@nitincd
Copy link

nitincd commented Mar 26, 2024

Looks like a meaningful expansion of the ability to query data external to OpenSearch.
Couple of questions:

  • What is the mechanism for moving data into OpenSearch?
  • What would be the cost profile of moving such data into materialized indexes?
  • What is the mechanism for doing data processing on the data (aggregation/filtering) before it is persisted in OpenSearch?
  • What is the security architecture of accessing such data sources?
  • In addition to data of different signals, can the user also access partitioned data of the same signal (for example, log data for same app that is partitioned into multiple clusters?)
  • Would PPL be the only mechanism for querying this data or could the user use the languages constructs that are native to the source? for example PromQL for Prometheus data
  • would we have an index per PPL query? or can these be grouped as a collection of indices based on the signal type?
  • how do visualizations get built on top of the materialized indices? do they work the same as current visualizations and dashboards?
  • Can these indices be set up to continuously refresh in the background? or are they static one time query results?

@Swiddis Swiddis removed the untriaged label Mar 26, 2024
@vmmusings
Copy link
Member

  • How do we make use of the stored results? How do we link to the future queries? An example would help.
  • For eg: source = prometheus.http_total_metrics | stats avg() . Now am I going to store these stat results into the index and how do I use it?
  • Metadata and result Index today are only associated with S3Glue datasource. We need to think deep on this having similar indices for other datasources.
  • We were also thinking to leverage materialized views from flint right? Is that part of this feature request.

@YANG-DB
Copy link
Member Author

YANG-DB commented Apr 2, 2024

@vamsi-amazon

  • How do we make use of the stored results?
    • In a similar way that flint has a metadata index and a result index - each datasource would have similar indices and every dashboard can use these indices to query.
  • ... and how do I use it?
    • Each datasource based query will add the associated index as the cache of the query and if data is found there it will return the content of this request.
  • Metadata and result Index today are only associated with S3Glue datasource.
    • Each datasource will be associated with its own remote connection/engine and this would generalize this approach to fit the Prometheus use case as well
  • We were also thinking to leverage materialized views from flint right?
    • This is another use case for which we can use flint to query Prometheus and store the results inside an MV index. This will allow many additional features but this doesn’t has to be mandatory - users can use Prometheus independent from flint and still be able to visualize and store locally the query results for cache and performance

@YANG-DB YANG-DB changed the title [FEATURE]Materialize external datasource PPL queries [FEATURE]Materialize external datasource queries results Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants