Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support refreshing mutable source data in Flint index #700

Open
dai-chen opened this issue Sep 25, 2024 · 0 comments
Open

[FEATURE] Support refreshing mutable source data in Flint index #700

dai-chen opened this issue Sep 25, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@dai-chen
Copy link
Collaborator

Is your feature request related to a problem?

Currently, Flint index only supports append mode, which assumes the source data is append-only log data. This works for many use cases, but in scenarios where the source data can be updated or deleted, the index cannot properly reflect changes, leading to outdated or incorrect results.

What solution would you like?

TBD: Support update output mode in Flint sink operator. Need to look into Spark and Flint code for proof of concept.

What alternatives have you considered?

  1. Focus on append-only log data and leave as-is: One option is to maintain the current approach by supporting only append-only log data, and not addressing mutable data. However, this limits the index's applicability to more dynamic datasets.

  2. Truncate Flint index and perform a full refresh: Another alternative is to periodically truncate the Flint index and fully refresh it. While this ensures updated data is reflected, it's time-consuming and resource-intensive, particularly for large datasets.

  3. Append new data with version control: A third option is to append new data while implementing version control. This would track changes by adding new versions of updated records, but it may lead to higher storage costs and complexity in managing multiple versions.

Do you have any additional context?

Ref: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant