Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

Open
dai-chen opened this issue May 13, 2024 · 0 comments
Open
Labels
0.5 bug Something isn't working

Comments

@dai-chen
Copy link
Collaborator

What is the bug?

In OpenSearch, Flint indices are stored in a single index, which may exceed the maximum document limit of Integer.MAX_VALUE. This is particularly problematic for Flint covering indexes, which can easily exceed this limit due to less aggregation compared to Flint skipping indexes and materialized views. In a test with the http_logs dataset, a single OpenSearch index was only able to hold approximately 133GB of data, potentially leading to data loss or index failures.

How can one reproduce the bug?

Create covering index from a large dataset.

What is the expected behavior?

The expected behavior would be that Flint indices can handle larger datasets without exceeding the maximum document count of a single OpenSearch index, or that a mechanism exists to split the data across multiple indices to avoid hitting this limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.5 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant