[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

dai-chen · 2024-05-13T17:08:50Z

What is the bug?

In OpenSearch, Flint indices are stored in a single index, which may exceed the maximum document limit of Integer.MAX_VALUE. This is particularly problematic for Flint covering indexes, which can easily exceed this limit due to less aggregation compared to Flint skipping indexes and materialized views. In a test with the http_logs dataset, a single OpenSearch index was only able to hold approximately 133GB of data, potentially leading to data loss or index failures.

How can one reproduce the bug?

Create covering index from a large dataset.

What is the expected behavior?

The expected behavior would be that Flint indices can handle larger datasets without exceeding the maximum document count of a single OpenSearch index, or that a mechanism exists to split the data across multiple indices to avoid hitting this limit.

The text was updated successfully, but these errors were encountered:

dai-chen added bug Something isn't working untriaged 0.5 and removed untriaged labels May 13, 2024

This was referenced Jun 3, 2024

[FEATURE] Performance and Scalability Enhancements for Flint Index #365

Open

[EPIC] Zero-ETL - Apache Iceberg Table Support #372

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

dai-chen commented May 13, 2024

[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

[BUG] Flint index cannot handle large dataset due to OpenSearch doc limit #339

Comments

dai-chen commented May 13, 2024