You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with Flint indexes on Iceberg branches, the refresh job processes the first micro-batch but then stops progressing. This happens because Iceberg branches are not supported for streaming reads in Spark Structured Streaming jobs, which primarily read from the snapshot lineage on the main table in Iceberg. As a result, Flint auto-refresh indexes (which rely on Spark streaming) get stuck after the first batch.
What solution would you like?
To avoid this, Flint should pre-validate whether users are attempting to create an auto or incremental-refresh index (backed by a Spark streaming job) on an Iceberg branch. If an Iceberg branch is detected, the creation process should fail with a clear message indicating this is not supported. This will notify users beforehand and prevent them from running into issues later when the streaming job starts but fails to progress.
What alternatives have you considered?
Alternatively, clear documentation could help users manually configure their indexes for manual refresh on Iceberg branches. However, by pre-validating and blocking the creation of auto-refresh indexes, Flint can ensure users are informed early and avoid job issues after the streaming job has started.
Users can perform a full manual refresh on an Iceberg branch; however, they will need to truncate the index to avoid duplicates before refreshing the data.
Is your feature request related to a problem?
When working with Flint indexes on Iceberg branches, the refresh job processes the first micro-batch but then stops progressing. This happens because Iceberg branches are not supported for streaming reads in Spark Structured Streaming jobs, which primarily read from the snapshot lineage on the main table in Iceberg. As a result, Flint auto-refresh indexes (which rely on Spark streaming) get stuck after the first batch.
What solution would you like?
To avoid this, Flint should pre-validate whether users are attempting to create an auto or incremental-refresh index (backed by a Spark streaming job) on an Iceberg branch. If an Iceberg branch is detected, the creation process should fail with a clear message indicating this is not supported. This will notify users beforehand and prevent them from running into issues later when the streaming job starts but fails to progress.
What alternatives have you considered?
Do you have any additional context?
The text was updated successfully, but these errors were encountered: