-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support partial indexing for skipping and covering index #89
Comments
One limit of #124 is that if table does not have partition column, the filter does not work. Two use case i can think of add Filter when creating skipping index are
Proposal 1
Spark structured streaming does not support modifiedAfter / modifiedBefore. https://issues.apache.org/jira/browse/SPARK-31962 |
Proposal 2 - leverage file metadata (Preferred)
more reading. https://docs.databricks.com/en/ingestion/file-metadata-column.html |
CREATE SKIPPING INDEX on table_name
REFRESH SKIPPING INDEX on table_name [ WHERE [metadata predicate | partition predicate] ] On demand refresh skipping index.
ALTER SKIPPING INDEX on table_name SET auto_refresh = true/false Specify whether enable / disable auto_refresh skipping index on table. Notes: auto_refresh = false does not stop current running refresh job. Limitation
|
Bug found in Spark 3.3.1/3.3.2 of metadata, Bug fixed in Spark 3.4 apache/spark#39870
|
Is your feature request related to a problem?
Currently there is no way to provide a start timestamp or WHERE clause in create index statement. That means skipping and covering index has to refresh data from the beginning. This may cause unnecessary computation and storage waste.
What solution would you like?
Support partial indexing by either:
CREATE INDEX ... WHERE status != 200 WITH (...)
Note that one challenge for this is the correctness of query rewrite. Skipping index query rewriter has to compare this filtering condition and one in query and decide if the query can be accelerated.
The text was updated successfully, but these errors were encountered: