query rewrite for LogsTable skipping index #154

seankao-az · 2023-11-14T03:32:35Z

Description

Query rewrite for LogsTable skipping index

Build skipping index

Process for building skipping index is unchanged due to compatibility of current method with LogsTable.

Query rewrite for skipping index

On query plan optimization time, we construct a new LogsTable with the DataFrame to fetch log file ids from skipping index. These log file ids are then used to build the scan operator.

Dependency

Added a compile time dependency which contains only the interface of LogsTable.

Test

Test is not possible without loading LogsConnectorSpark fat jar as dependency. Instead, manual integration test is done locally.

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sean Kao <[email protected]>

penghuo · 2023-11-14T16:47:08Z

...ration/src/main/scala/org/opensearch/flint/spark/skipping/ApplyFlintSparkSkippingIndex.scala

+            indexScan
+              .filter(new Column(indexFilter.get))
+              .select(FILE_PATH_COLUMN)
+              .collect


collect is reduce operation, @dai-chen could you help sean fix this

Discussed with Sean that this requires changes in LogsTable.

Similar as Flint FileIndex implementation:

It accepts indexScan DataFrame instead of result Set

It triggers the data frame collect at execution time

@seankao-az correct me if I understood wrong.

That's right. Need changes in LogsTable side. Right now LogsTable accepts a list of file ids. Should let it accept DataFrame instead.

Done. Integration tested locally together with changes from dependency package.

...ration/src/main/scala/org/opensearch/flint/spark/skipping/ApplyFlintSparkSkippingIndex.scala

Signed-off-by: Sean Kao <[email protected]>

query rewrite for nexus skipping index

2713f08

Signed-off-by: Sean Kao <[email protected]>

seankao-az requested review from dai-chen, rupal-bq, vmmusings, penghuo, anirudha, kaituo and YANG-DB as code owners November 14, 2023 03:32

seankao-az changed the title ~~query rewrite for nexus skipping index~~ query rewrite for cloudwatch logs skipping index Nov 14, 2023

penghuo reviewed Nov 14, 2023

View reviewed changes

seankao-az changed the title ~~query rewrite for cloudwatch logs skipping index~~ query rewrite for LogsTable skipping index Nov 14, 2023

dai-chen reviewed Nov 14, 2023

View reviewed changes

...ration/src/main/scala/org/opensearch/flint/spark/skipping/ApplyFlintSparkSkippingIndex.scala Outdated Show resolved Hide resolved

dai-chen approved these changes Nov 14, 2023

View reviewed changes

dai-chen added the enhancement New feature or request label Nov 14, 2023

Delay index scan collect to query execution time

deb5fe8

Signed-off-by: Sean Kao <[email protected]>

seankao-az force-pushed the nexus-skipping branch from 2c33440 to deb5fe8 Compare November 14, 2023 23:44

penghuo approved these changes Nov 15, 2023

View reviewed changes

penghuo merged commit 0351f40 into opensearch-project:main Nov 15, 2023
4 checks passed

seankao-az mentioned this pull request Aug 9, 2024

Remove query rewrite for LogsTable skipping index #551

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query rewrite for LogsTable skipping index #154

query rewrite for LogsTable skipping index #154

seankao-az commented Nov 14, 2023 •

edited

Loading

penghuo Nov 14, 2023

dai-chen Nov 14, 2023 •

edited

Loading

seankao-az Nov 14, 2023

seankao-az Nov 14, 2023

query rewrite for LogsTable skipping index #154

query rewrite for LogsTable skipping index #154

Conversation

seankao-az commented Nov 14, 2023 • edited Loading

Description

Build skipping index

Query rewrite for skipping index

Dependency

Test

Issues Resolved

penghuo Nov 14, 2023

Choose a reason for hiding this comment

dai-chen Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

seankao-az Nov 14, 2023

Choose a reason for hiding this comment

seankao-az Nov 14, 2023

Choose a reason for hiding this comment

seankao-az commented Nov 14, 2023 •

edited

Loading

dai-chen Nov 14, 2023 •

edited

Loading