[FEATURE] Support refreshing mutable source data in Flint index #700

dai-chen · 2024-09-25T21:10:16Z

Is your feature request related to a problem?

Currently, Flint index only supports append mode, which assumes the source data is append-only log data. This works for many use cases, but in scenarios where the source data can be updated or deleted, the index cannot properly reflect changes, leading to outdated or incorrect results.

What solution would you like?

TBD: Support update output mode in Flint sink operator. Need to look into Spark and Flint code for proof of concept.

What alternatives have you considered?

Focus on append-only log data and leave as-is: One option is to maintain the current approach by supporting only append-only log data, and not addressing mutable data. However, this limits the index's applicability to more dynamic datasets.
Truncate Flint index and perform a full refresh: Another alternative is to periodically truncate the Flint index and fully refresh it. While this ensures updated data is reflected, it's time-consuming and resource-intensive, particularly for large datasets.
Append new data with version control: A third option is to append new data while implementing version control. This would track changes by adding new versions of updated records, but it may lead to higher storage costs and complexity in managing multiple versions.

Do you have any additional context?

Ref: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes

dai-chen added enhancement New feature or request untriaged and removed untriaged labels Sep 25, 2024

dai-chen mentioned this issue Sep 27, 2024

[FEATURE] Handle Iceberg overwrite and delete snapshots to prevent index refresh failure #708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support refreshing mutable source data in Flint index #700

[FEATURE] Support refreshing mutable source data in Flint index #700

dai-chen commented Sep 25, 2024

[FEATURE] Support refreshing mutable source data in Flint index #700

[FEATURE] Support refreshing mutable source data in Flint index #700

Comments

dai-chen commented Sep 25, 2024