Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV reporting service for exporting ML notifications #145082

Open
darnautov opened this issue Nov 14, 2022 · 5 comments
Open

CSV reporting service for exporting ML notifications #145082

darnautov opened this issue Nov 14, 2022 · 5 comments
Labels
Feature:Reporting:CSV Reporting issues pertaining to CSV file export :ml old Used to help sort old issues on GH Projects which don't support the Created search term. Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)

Comments

@darnautov
Copy link
Contributor

We'd like to utilize the CSV reporting service for exporting ML notifications. The existing solution should be updated to account for the following requirements.

  1. Source index is a system .ml-notifications* index, i.e. ML notifications are not Kibana saved objects.
  2. Different ML entity types can share the same ID, e.g. there could be an anomaly detection job and the data frame analytics jobs with ID my_ml_job, hence for space-aware requirements, regular ES filter in the query might not suffice and the service should support custom filters.
@darnautov darnautov added (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead :ml Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) labels Nov 14, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@tsullivan
Copy link
Member

Hi @darnautov! Thanks for creating this issue.

A few questions:

  • Do you have a rough idea of how many rows the exported CSV files will contain?
  • Same question for how many shards will need to be accessed for these exports?

ping @elastic/kibana-global-experience

@darnautov
Copy link
Contributor Author

hi @tsullivan! The number of rows depends on:

  • the time range
  • query string provided by a user
  • how many ML entities exist in a cluster. Roughly, one job produces around 100 notifications per week. We presume in a big cluster if somebody tries to export every notification ever created then it could be ~100,000.

Not sure about the shards, perhaps @dimitris-athanasiou or @droberts195 could answer that.

@droberts195
Copy link
Contributor

Not sure about the shards, perhaps @dimitris-athanasiou or @droberts195 could answer that.

1 or 2. Might be 3 in the future. We create a new notifications index each time we change the mappings, and so far we've done that once. Each index is set up to have 1 shard.

@tsullivan
Copy link
Member

tsullivan commented Nov 21, 2022

Hi @darnautov and @droberts195

I'd like to refer you to a writeup on CSV export thoughts here. In the writeup, I mention that we chose the best option available to us for CSV export, but it is known to be not perfect. Since it is not async search, there could be timeout problems that need to be carefully handled. When we export the CSV from Discover, it creates a report artifact, and runs search in a background task where timeouts are carefully handled and bubbled up to the user in the form of a warning in the report job.

In the Reporting domain, there is another feature that lets the user download CSV from an API, without creating a report job artifact. My hope is that we can deprecate that feature, simplify the code paths in Reporting, and just use report jobs for everything. I think that will benefit you as well: the better handling of timeouts and error tracking will make it easier for support / developers when issues happen. I believe the use case of having some kind of post-filter can still be supported. My sense of how it would work, we can create a new "export type" for this use case to register with the Reporting plugin. When the user wants to download the ML notifications, they would see a popup that a search job is running in the background, and then another popup containing a download link would show when it is ready. They could later go to Stack Management > Alerts and Insights > Reporting to re-download the file, or delete it from storage if they no longer need it. The ML notification report types would have a unique icon in the table of reports.

Does this make sense, and does it sound OK?

cc @sixstringcode

@peteharverson peteharverson assigned darnautov and unassigned darnautov Apr 25, 2023
@tsullivan tsullivan added Feature:Reporting:CSV Reporting issues pertaining to CSV file export and removed (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead labels Aug 1, 2024
@petrklapka petrklapka added the old Used to help sort old issues on GH Projects which don't support the Created search term. label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Reporting:CSV Reporting issues pertaining to CSV file export :ml old Used to help sort old issues on GH Projects which don't support the Created search term. Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience)
Projects
None yet
Development

No branches or pull requests

5 participants