Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle projection pushdown in the metadata cache #25584

Open
Tracked by #25539
hiltontj opened this issue Nov 22, 2024 · 0 comments
Open
Tracked by #25539

Handle projection pushdown in the metadata cache #25584

hiltontj opened this issue Nov 22, 2024 · 0 comments
Labels

Comments

@hiltontj
Copy link
Contributor

hiltontj commented Nov 22, 2024

Problem

The TableProvider implementation for the MetaCacheFunctionProvider is not currently handling projection pushdown:

impl TableProvider for MetaCacheFunctionProvider {

This means that the cache will be getting a full scan (within the bounds of provided predicates) regardless of the provided projection. For a cache that has multiple levels, if the user is only interested in the top level of the cache, this could lead to unnecessary cycles spent scanning lower levels of the cache; if the user is interested in lower levels of the cache, then we still need to scan through the higher levels, but at the least, we could avoid building the arrow buffers for those columns.

In addition, projection to lower levels of the cache is not ordered, however, that may need a separate issue.

Proposed solution

The projection provided to the TableProvider::scan could be passed down to the MetaCache::to_record_batch to more optimally scan the cache:

  • do not build arrow buffers for un-needed columns
  • only scan down to the lowest needed level in the cache
  • update the MetaCacheExec to include details about projected columns

Alternatives

N/A

Additional context

Currently, DataFusion handles projection at a higher level, so this isn't a show-stopper, the cache will still work as it is intended when projections are provided in the query.

The method that walks the cache hierarchy to do predicate evaluation and build the arrow buffers is here.

An example showing that the output when projecting a lower column is not ordered is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant