Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support filter pushdown for datafusion #203

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Nov 29, 2024

Description

Added Expr to PartitionFilter conversion to pass in filters. Datafusion will pass down all filters for now using supports_filters_pushdown and will filter after partition filters.

Closes #160.

Steps forward

I noticed the pr was getting a bit big to be reviewed all at once, here are some things that will be worked on afterwards:

  • Support more operators
  • Support more Datafusion Expressions
  • Enhance the python implementation for filter conversion
  • Make easier to create PartitionFilters for testing, etc.

How are the changes test-covered

Added unit tests for all added functionality

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 83.96947% with 21 lines in your changes missing coverage. Please review.

Project coverage is 90.62%. Comparing base (a2738da) to head (2537a9d).

Files with missing lines Patch % Lines
crates/datafusion/src/utils/exprs_to_filter.rs 77.08% 11 Missing ⚠️
python/src/internal.rs 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
- Coverage   91.78%   90.62%   -1.17%     
==========================================
  Files          20       24       +4     
  Lines         962     1067     +105     
==========================================
+ Hits          883      967      +84     
- Misses         79      100      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crates/core/src/exprs/mod.rs Outdated Show resolved Hide resolved
crates/core/src/table/fs_view.rs Outdated Show resolved Hide resolved
@@ -194,7 +195,7 @@ impl Table {
pub async fn get_file_slices_splits(
&self,
n: usize,
filters: &[(&str, &str, &str)],
filters: &[PartitionFilter],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should provide a generic struct for constructing filter, and not limited to PartitionFilter. And let's mark the existing api deprecated in the upcoming release and only remove in a future release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the fixes but I was a little confused how to deprecate this. Should I create another similar function?

python/src/internal.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/lib.rs Outdated Show resolved Hide resolved
crates/core/src/exprs/filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
crates/datafusion/src/utils/exprs_to_filter.rs Outdated Show resolved Hide resolved
python/src/internal.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature rust Related to Rust codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate with datafusion to support filters pushdown from SQL
2 participants