-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skipping index recommendations for specific columns #300
Conversation
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
* @return | ||
* skipping index recommendation dataframe | ||
*/ | ||
def analyzeSkippingIndex(tableName: String): Seq[Row] = { | ||
new DataTypeSkippingStrategy().analyzeSkippingIndexColumns(tableName, spark) | ||
def analyzeSkippingIndex(inputs: Map[String, List[String]]): Seq[Row] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use some abstraction here instead of generic Map? DataFrame
or any existing Table abstraction we can use here?
Because you may ignore query/function as input for now. The reason is in Limitation no.1 in #298 (comment). I'm thinking can we add generic query analyze API for all Flint index, ex. ANALYZE FLINT INDEX FOR query
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised revision with DataFrame
. bdw should grammar be ANALYZE FLINT INDEX
or ANALYZE SKIPPING INDEX
? Can same static rules apply to any type of index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts. I was thinking of something like ANALYZE FLINT INDEX FOR query
. For example:
ANALYZE FLINT INDEX FOR SELECT * FROM test
: recommend covering indexANALYZE FLINT INDEX FOR SELECT ... WHERE clientip = ...
: recommend skipping and coveringANALYZE FLINT INDEX FOR SELECT ... GROUP BY ...
: recommend MV
...src/main/scala/org/opensearch/flint/spark/skipping/recommendations/RecommendationRules.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
* @return | ||
* skipping index recommendation dataframe | ||
*/ | ||
def analyzeSkippingIndex(tableName: String): Seq[Row] = { | ||
new DataTypeSkippingStrategy().analyzeSkippingIndexColumns(tableName, spark) | ||
def analyzeSkippingIndex(schema: StructType, data: Seq[Row]): Seq[Row] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkSession
is available in this class. I think the input of this API can be simply tableName and columnNames? This is convenient for users who rely on Flint API instead of SQL layer.
if (ctx.indexColumns != null) { | ||
ctx.indexColumns.multipartIdentifierProperty().forEach { indexColCtx => | ||
data = data :+ Row(ctx.tableName().getText, indexColCtx.multipartIdentifier().getText) | ||
} | ||
} else { | ||
data = data :+ Row(ctx.tableName().getText, null.asInstanceOf[String]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Scala stream map()
instead of forEach()
?
columns = table.schema().fields.map(field => field.name).toList | ||
} | ||
columns.foreach(column => { | ||
val field = findField(table.schema(), column).get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you refactor this method and make it more readable? I think only line 50 - 62 is the core logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored this method. Can you please take another look?
/** | ||
* Recommendation rules for skipping index column and algorithm selection. | ||
*/ | ||
object RecommendationRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thought: making this Rule abstraction may be more useful than static util methods?
Probably we can think about how to extend this for recommendation on WHERE clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that can be useful if have separate implementation for data type and function based rules, but I was thinking to have all static rules at one place. e.g https://github.com/rupal-bq/opensearch_spark/blob/query-recommendations/flint-spark-integration/src/main/resources/skipping_index_recommendation.conf#L50
Do you see any problem with this approach for recommendation on WHERE clause?
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Signed-off-by: Rupal Mahajan <[email protected]>
Closing this PR due to prolonged inactivity. Please rebase if you wish to reopen it. |
Description
ANALYZE SKIPPING INDEX ON TABLE datasource.database.table(column1, column2, ...)
Issues Resolved
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.