-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of DaftDataFrameEngine #3457
base: master
Are you sure you want to change the base?
Conversation
Nice! This looks like a great starting point - let me do the work we discussed on my end and (1) cleanup unused functions and (2) move pandas/python specific logic behind DF engine functions where possible |
I added a shim class on top of the Daft Dataframe here to ease the transition! These shim classes are the "external facing" classes of the DaftDataEngine, rather than just a naked The shim classes:
|
Unit Test Results 4 files ± 0 4 suites ±0 17s ⏱️ - 55m 38s For more details on these errors, see this check. Results for commit 2a8e399. ± Comparison against base commit 60f1416. This pull request removes 33 and adds 5 tests. Note that renamed tests count towards both.
This pull request removes 4 skipped tests and adds 4 skipped tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Quick question @arnavgarg1 - do you know how Based on the function signature/naming it seems like the intention is for
It appears they are both used fairly interchangeably. I wonder if y'all wanted to maybe consolidate them? Ideally also if they don't run on arbitrary Pandas dictionaries, and instead users of the API must specify the names of the columns that they need to map over, it would make query optimization more effective! A black-box "map this function over the entire dataframe" leaves Daft with no choice but to think that your query requires every single column, even if in practice your Python function only uses one specific column. |
69810f9
to
36db991
Compare
Code Pull Requests
This PR introduces the DaftDataFrameEngine, which is a DataFrameEngine implementation that is backed by Daft
This has several advantages: