Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Covering index not present inside query #107

Closed
YANG-DB opened this issue Oct 26, 2023 · 1 comment
Closed

[BUG]Covering index not present inside query #107

YANG-DB opened this issue Oct 26, 2023 · 1 comment
Labels
wontfix This will not be worked on

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Oct 26, 2023

What is the bug?

Query on S3 based data after defining a covering index doesn’t use the covering index during the execution ...

How to Reproduce
Created the next query based on the next table:

Table

"CREATE EXTERNAL TABLE mys3.default.http_logs (
   `@timestamp` TIMESTAMP,
    clientip STRING,
    request STRING, 
    status INT, 
    size INT, 
    year INT, 
    month INT, 
    day INT) 
USING json PARTITIONED BY(year, month, day) OPTIONS (path 's3://flint-data-dp-eu-west-1-beta/data/http_log/http_logs_partitioned_json_bz2/', compression 'bzip2')"

Covering Index

CREATE INDEX status_and_day
ON mys3.default.http_logs ( status, day )
WITH (
  auto_refresh = true,
  refresh_interval = '1 minute',
  checkpoint_location = 's3://flint-data-dp-eu-west-1-beta/data/http_log/checkpoint_status_and_day'
)

Query

SELECT
    day,
    status
FROM mys3.default.http_logs
WHERE status >= 400
GROUP BY day, status
LIMIT 100;

Explain Query

-- Explain:
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- CollectLimit 100
 +- HashAggregate(keys=[day#107, status#103], functions=[])
 +- Exchange hashpartitioning(day#107, status#103, 1000), ENSURE_REQUIREMENTS, [plan_id=75]
 +- HashAggregate(keys=[day#107, status#103], functions=[])
 +- Project [status#103, day#107]
 +- Filter (isnotnull(status#103) AND (status#103 >= 400))
 +- FileScan json default.http_logs[status#103,year#105,month#106,day#107] Batched: false, DataFilters: [isnotnull(status#103), (status#103 >= 400)], Format: JSON, Location: CatalogFileIndex(1 paths)[s3://flint-data-dp-eu-west-1-beta/data/http_log/http_logs_partitioned_j..., PartitionFilters: [], PushedFilters: [IsNotNull(status), GreaterThanOrEqual(status,400)], ReadSchema: struct<status:int>

What is the expected behavior?
I'm expecting the covering index rule to kick in and be part of the physical execution plan

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@YANG-DB YANG-DB added bug Something isn't working untriaged labels Oct 26, 2023
@dai-chen
Copy link
Collaborator

@YANG-DB Thanks for reporting the issue! Same as #103, covering index/MV is only available in OpenSearch and thus no query rewrite support for now.

@dai-chen dai-chen added wontfix This will not be worked on and removed bug Something isn't working labels Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants