Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] RFC: Enhancements to Trace Analytics Plugin #2141

Open
ps48 opened this issue Sep 6, 2024 · 1 comment
Open

[Meta] RFC: Enhancements to Trace Analytics Plugin #2141

ps48 opened this issue Sep 6, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request roadmap traces traces telemetry related features

Comments

@ps48
Copy link
Member

ps48 commented Sep 6, 2024

Overview

The Trace Analytics plugin in OpenSearch Dashboards provides users with an intuitive interface to analyze and visualize trace data. Initially, the plugin supported integration with Data Prepper, and later extended support to the Jaeger schema. Since its introduction in ODFE 1.12 and OpenSearch 1.0, the plugin has functioned as a read-only solution. It currently does not store metadata and relies on browser session storage to maintain modes. This RFC outlines new features and enhancements aimed at improving storage capabilities, UI design, query performance, and seamless integration with other OpenSearch Dashboards plugins.

Plugin as it exists today

  • The plugin enables users to explore and analyze trace data from span and service indices through a user-friendly interface.
  • Initially built with support for Data Prepper, it has evolved to also support Jaeger schema for broader trace collection compatibility.
  • First introduced in Open Distro for Elasticsearch (ODFE) 1.12 and OpenSearch 1.0.
  • The plugin operates in a read-only mode, without the ability to persist configurations across sessions.
  • No trace metadata is stored, and configurations are temporarily saved in browser session storage only.
  • Lacks support for advanced features like custom indices, cross-cluster queries, or configuration persistence.

Proposed Enhancements

The proposed changes address the following requirements:

  1. Storage Layer

    • Introduce a connection to the storage layer by saving configurations in the .kibana index.
    • Store trace and service index configurations.
    • Support for custom indices with wildcards and cross-cluster queries.
  2. UI Enhancements

    • A dedicated Traces and Services page in the new navigation system.
    • A refreshed look and feel for a more efficient, sleeker, and consistent UI.
    • An updated service map graph UI that can display in full-screen mode with easy navigation.
    • Componentization of existing UI widgets, such as the traces table and service map.
  3. Query Optimization

    • Optimize DSL queries for better aggregation performance, especially at scale.
    • Leverage pre-aggregated indices from Data Prepper to reduce the load on OpenSearch during aggregation operations.
  4. Seamless Correlation

    • Implement schema-based correlation between logs, traces, and metrics using OTEL standards.
    • Improve navigation and correlation between Trace Analytics and other OpenSearch Dashboards plugins like Discover, Anomaly Detection, and Alerting.

Architecture

The enhancements will involve the following architectural changes:

  1. Storage Layer
    A new storage layer will be integrated by utilizing the .kibana index for saving user-configured trace and service index configurations. This will allow for persistence across sessions and the use of custom index patterns.

  2. Optimized Query Execution
    Refactor the current query execution strategy to better handle large datasets by querying pre-aggregated indices where available, particularly from Data Prepper. This will minimize the load on OpenSearch when performing aggregation-heavy queries, especially for large clusters.

  3. Cross-Cluster Search
    Enable cross-cluster querying, allowing users to pull trace data from multiple clusters, enhancing the scalability of the Trace Analytics plugin.

UI Design

The new UI will focus on consistency and usability:

  1. Navigation Changes
    Add dedicated pages for traces and services in the main navigation, making it easier for users to access each area.

  2. Service Map Enhancements
    The service map graph will be upgraded to support full-screen view and improved navigation, making it easier for users to analyze services visually.

  3. Componentization
    UI components like the trace table and service map will be refactored into reusable widgets, facilitating their use in different contexts across OpenSearch Dashboards.

Appendix

@ps48 ps48 added enhancement New feature or request untriaged and removed untriaged labels Sep 6, 2024
@ps48 ps48 self-assigned this Sep 6, 2024
@YANG-DB
Copy link
Member

YANG-DB commented Sep 9, 2024

Hi @ps48
Thanks for the great review and suggestions, here are my comments:

Storage Layer : we should leverage the existing spark-flint-metadata indices that store the queries and indices used for each request and reuse these concepts so that in the future we could even use such federated queries for fetching the traces from outside of our internal indices

UI Enhancements : we should allow both build-in widgets and also user (custom) build dashboards to be used as the default trace/services analytics viewport so that if a user has already created a dashboard to visualize some of these data-points it can be used within the same UI dialogs.

In addition we should leverage our existing vega UX components to replace the existing services network graph & traces burn charts as well

Query Optimization : The location of the aggregated data-points can be generalized using the .ql-job-metadata as shown here we can leverage these indices and represent the aggregated queries using the metadata index and allow the UI to discover the location if the aggregated data, using the known schema the mapping should be compatible to the expected query fields.

Seamless Correlation : Correlation is basically a query that pre-joins / live-joins different indices using a know common key. We can again use the same general purpose mechanism (used in flint-spark) to define the queries including the metadata and direct to the results location index.
making this a general purpose framework will simplify the UI components ability to discover the location of the resulting data, the data-schema and the depended indices that are used to the actual query composition.

@YANG-DB YANG-DB added traces traces telemetry related features roadmap labels Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap traces traces telemetry related features
Projects
Status: New
Development

No branches or pull requests

2 participants