Implement analyze skipping index statement #284

rupal-bq · 2024-03-14T17:22:50Z

Description

Add ANALYZE SKIPPING INDEX statement. This returns recommendation for skipping index based on following rules.

All top-level columns are selected
PARTITION algorithm is recommended for partition columns
MIN_MAX algorithm is recommended for numerical data types columns
VALUE_SET algorithm is recommended for boolean data type columns
BLOOM_FILTER algorithm is recommended for all other supported columns
Unsupported data type columns are skipped.

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rupal Mahajan <[email protected]>

…ch_spark into analyze-skipping-index Signed-off-by: Rupal Mahajan <[email protected]>

noCharger · 2024-03-14T17:53:22Z

flint-spark-integration/src/main/antlr4/FlintSparkSqlExtensions.g4

@@ -105,6 +106,10 @@ vacuumCoveringIndexStatement
    : VACUUM INDEX indexName ON tableName
    ;

+analyzeSkippingIndexStatement
+    : ANALYZE SKIPPING INDEX ON tableName


Is this grammar finalized? What is the semantic meaning?

This is proposed grammar. Please comment if you have any other suggestions. Analyze refers to examining data to get insights. This command will return recommendation for creating skipping index (skipping index columns with suggested data structure) based on table data.

This is proposed grammar.

Any reference / compatibility analysis with the mainstream syntax?

Please comment if you have any other suggestions.

Just brainstorming -

ANALYZE TABLE tableName FOR SKIPPING INDEX RECOMMENDATIONS;

Or

ANALYZE TABLE tableName RECOMMEND SKIPPING INDEX COLUMNS;

The assumption is we may want to do more things other from the recommendation.

ref https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ANALYZE.html#GUID-535CE98E-2359-4147-839F-DCB3772C1B0E

Signed-off-by: Rupal Mahajan <[email protected]>

…ndex Signed-off-by: Rupal Mahajan <[email protected]>

Signed-off-by: Rupal Mahajan <[email protected]>

dai-chen · 2024-03-18T16:46:28Z

...ain/scala/org/opensearch/flint/spark/skipping/recommendations/DataTypeSkippingStrategy.scala

+
+class DataTypeSkippingStrategy extends AnalyzeSkippingStrategy {
+
+  val rules = Map(


I'm thinking if more flexible to move this static mapping to config file? Or maybe not necessary for this P0 solution?

good idea. added this here thinking it's specific to data type based recommendation and won't be used by other strategies(e.g. recommendation based on table stats).

I will take this up as fast follow up because it will unblock sql plugin if we can finalize grammar before 2.13 release.

dai-chen

If we also want to merge implementation in this PR, could you update user manual like this? https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#all-indexes. Or if no time, we can just merge grammar in this PR.

Signed-off-by: Rupal Mahajan <[email protected]>

rupal-bq · 2024-03-18T19:41:37Z

If we also want to merge implementation in this PR, could you update user manual like this? https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#all-indexes. Or if no time, we can just merge grammar in this PR.

Thanks! Updated user manual.

dai-chen · 2024-03-18T20:17:31Z

...ain/scala/org/opensearch/flint/spark/skipping/recommendations/DataTypeSkippingStrategy.scala

+    val partitionFields = table.partitioning().flatMap { transform =>
+      transform
+        .references()
+        .collect({ case reference =>
+          reference.fieldNames()
+        })
+        .flatten
+        .toSet
+    }


I'm not sure if this is the right API because I've only used table.schema(). Could you double check this along with the comment above later? I will merge this PR for now so we can get the grammar into SQL plugin side.

Sure will do. Thanks!

rupal-bq added 2 commits March 12, 2024 13:59

dummy result test

f66994a

Signed-off-by: Rupal Mahajan <[email protected]>

Add grammar for analyze skipping index

5ce393d

Signed-off-by: Rupal Mahajan <[email protected]>

rupal-bq requested review from dai-chen, vmmusings, penghuo, anirudha, kaituo and YANG-DB as code owners March 14, 2024 17:22

Merge branch 'analyze-skipping-index' of github.com:rupal-bq/opensear…

fb8d12d

…ch_spark into analyze-skipping-index Signed-off-by: Rupal Mahajan <[email protected]>

noCharger reviewed Mar 14, 2024

View reviewed changes

rupal-bq added 7 commits March 14, 2024 22:52

Add analyze skippig index function

bba6823

Signed-off-by: Rupal Mahajan <[email protected]>

update analyze strategy

cc46bbd

Signed-off-by: Rupal Mahajan <[email protected]>

Update recommendations

b04b0b8

Signed-off-by: Rupal Mahajan <[email protected]>

Add test

472a962

Signed-off-by: Rupal Mahajan <[email protected]>

Format code

37d3df3

Signed-off-by: Rupal Mahajan <[email protected]>

Remove unused import

2fdf421

Signed-off-by: Rupal Mahajan <[email protected]>

Merge branch 'analyze-skipping-index' into grammar-analyze-skipping-i…

4a8fc1e

…ndex Signed-off-by: Rupal Mahajan <[email protected]>

rupal-bq changed the title ~~Add sql grammar support for analyze skipping index statement~~ Implement analyze skipping index statement Mar 18, 2024

Update doc

54d7ee6

Signed-off-by: Rupal Mahajan <[email protected]>

dai-chen reviewed Mar 18, 2024

View reviewed changes

dai-chen added enhancement New feature or request 0.3 labels Mar 18, 2024

dai-chen reviewed Mar 18, 2024

View reviewed changes

rupal-bq added 3 commits March 18, 2024 10:54

Merge branch 'main' into grammar-analyze-skipping-index

4bac586

Signed-off-by: Rupal Mahajan <[email protected]>

Update doc

e289a9c

Signed-off-by: Rupal Mahajan <[email protected]>

nit

a720116

Signed-off-by: Rupal Mahajan <[email protected]>

dai-chen approved these changes Mar 18, 2024

View reviewed changes

dai-chen reviewed Mar 18, 2024

View reviewed changes

dai-chen merged commit e6a97dc into opensearch-project:main Mar 18, 2024
4 checks passed

This was referenced Mar 18, 2024

[Feature] OpenSearch and Apache Spark Integration #3

Closed

Add new Flint SQL grammar opensearch-project/sql#2558

Closed

Handle ALTER Index Queries. opensearch-project/sql#2554

Merged

rupal-bq mentioned this pull request Mar 19, 2024

Move analyze skipping index rules to config #288

Closed

rupal-bq mentioned this pull request Apr 1, 2024

Add skipping index recommendations for specific columns #300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement analyze skipping index statement #284

Implement analyze skipping index statement #284

rupal-bq commented Mar 14, 2024 •

edited

Loading

noCharger Mar 14, 2024

rupal-bq Mar 14, 2024

noCharger Mar 14, 2024 •

edited

Loading

dai-chen Mar 18, 2024

rupal-bq Mar 18, 2024

rupal-bq Mar 18, 2024

dai-chen left a comment •

edited

Loading

rupal-bq commented Mar 18, 2024

dai-chen Mar 18, 2024 •

edited

Loading

rupal-bq Mar 18, 2024


		class DataTypeSkippingStrategy extends AnalyzeSkippingStrategy {

		val rules = Map(

Implement analyze skipping index statement #284

Implement analyze skipping index statement #284

Conversation

rupal-bq commented Mar 14, 2024 • edited Loading

Description

Issues Resolved

noCharger Mar 14, 2024

Choose a reason for hiding this comment

rupal-bq Mar 14, 2024

Choose a reason for hiding this comment

noCharger Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

dai-chen Mar 18, 2024

Choose a reason for hiding this comment

rupal-bq Mar 18, 2024

Choose a reason for hiding this comment

rupal-bq Mar 18, 2024

Choose a reason for hiding this comment

dai-chen left a comment • edited Loading

Choose a reason for hiding this comment

rupal-bq commented Mar 18, 2024

dai-chen Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

rupal-bq Mar 18, 2024

Choose a reason for hiding this comment

rupal-bq commented Mar 14, 2024 •

edited

Loading

noCharger Mar 14, 2024 •

edited

Loading

dai-chen left a comment •

edited

Loading

dai-chen Mar 18, 2024 •

edited

Loading