You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Flint index only supports append mode, which assumes the source data is append-only log data. This works for many use cases, but in scenarios where the source data can be updated or deleted, the index cannot properly reflect changes, leading to outdated or incorrect results.
What solution would you like?
TBD: Support update output mode in Flint sink operator. Need to look into Spark and Flint code for proof of concept.
What alternatives have you considered?
Focus on append-only log data and leave as-is: One option is to maintain the current approach by supporting only append-only log data, and not addressing mutable data. However, this limits the index's applicability to more dynamic datasets.
Truncate Flint index and perform a full refresh: Another alternative is to periodically truncate the Flint index and fully refresh it. While this ensures updated data is reflected, it's time-consuming and resource-intensive, particularly for large datasets.
Append new data with version control: A third option is to append new data while implementing version control. This would track changes by adding new versions of updated records, but it may lead to higher storage costs and complexity in managing multiple versions.
Is your feature request related to a problem?
Currently, Flint index only supports append mode, which assumes the source data is append-only log data. This works for many use cases, but in scenarios where the source data can be updated or deleted, the index cannot properly reflect changes, leading to outdated or incorrect results.
What solution would you like?
TBD: Support update output mode in Flint sink operator. Need to look into Spark and Flint code for proof of concept.
What alternatives have you considered?
Focus on append-only log data and leave as-is: One option is to maintain the current approach by supporting only append-only log data, and not addressing mutable data. However, this limits the index's applicability to more dynamic datasets.
Truncate Flint index and perform a full refresh: Another alternative is to periodically truncate the Flint index and fully refresh it. While this ensures updated data is reflected, it's time-consuming and resource-intensive, particularly for large datasets.
Append new data with version control: A third option is to append new data while implementing version control. This would track changes by adding new versions of updated records, but it may lead to higher storage costs and complexity in managing multiple versions.
Do you have any additional context?
Ref: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes
The text was updated successfully, but these errors were encountered: