Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Improve deep pagination method for CSV export #143144

Closed
tsullivan opened this issue Oct 11, 2022 · 3 comments · Fixed by #144201
Closed

[Reporting] Improve deep pagination method for CSV export #143144

tsullivan opened this issue Oct 11, 2022 · 3 comments · Fixed by #144201
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:large Large Level of Effort

Comments

@tsullivan
Copy link
Member

tsullivan commented Oct 11, 2022

CSV export uses the scroll API to paginate through all the data for a user's search. This has considerable expense internally in Elasticsearch, especially when the search spans a large number of shards.

Alternatives:

  1. Async search can be used to search a large amount of data over a large amount of shards.
  2. Point-in-time can be used to page through search hits when there are more than 10,000. There's no limit to the number of shards for backing the data. Elasticsearch adds an automatic tiebreaker to the sort results when PIT is used, so the search_after will not skip over documents.

Requirements:

@tsullivan tsullivan added (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:critical This issue should be addressed immediately due to a critical level of impact on the product. Team:AppServicesUx labels Oct 11, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@tsullivan tsullivan changed the title [Reporting] Use Async Search for CSV export [Reporting] Improve deep pagination method for CSV export Oct 17, 2022
@tsullivan
Copy link
Member Author

The goal is to use the Point-in-time API.

  1. When the data is time-based, the user's timestamp field will be used as the sort key for sort_after. In case there are duplicate timestamps in their data, we must add another sort field to be the tiebreaker. I'll try to use the document _id as the tie breaker.
  2. When the data is not time-based, the data will be sorted by _id.

@tsullivan tsullivan added the bug Fixes for quality problems that affect the customer experience label Oct 24, 2022
@exalate-issue-sync exalate-issue-sync bot added the loe:large Large Level of Effort label Oct 26, 2022
@tsullivan
Copy link
Member Author

related #88303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:large Large Level of Effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants