Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Prediction with empty partitions fails on sklearn dask-ml models #414

Open
VibhuJawa opened this issue Mar 2, 2022 · 5 comments
Open
Assignees
Labels
bug Something isn't working machine learning Improvements or issues with machine learning functionality

Comments

@VibhuJawa
Copy link
Collaborator

Prediction with empty partitions fails on sklearn dask-ml Models . This is because sklearn currently errors on empty frames. I am opening this issue here to track the best approach (wether its a fix that should go in dask-ml or sklearn or dask-sql.

Trace:

Exception: "ValueError('Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.')"

What happened:

%%sql
SELECT * FROM PREDICT(
  MODEL model,
  SELECT * FROM test_set limit 100
)

What you expected to happen:

Would expect this to work similar to cuML .

@VibhuJawa VibhuJawa added bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer labels Mar 2, 2022
@VibhuJawa VibhuJawa self-assigned this Mar 2, 2022
@VibhuJawa VibhuJawa added machine learning Improvements or issues with machine learning functionality and removed needs triage Awaiting triage by a dask-sql maintainer labels Mar 7, 2022
@charlesbluca
Copy link
Collaborator

Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions

@VibhuJawa
Copy link
Collaborator Author

Is this an issue that can be narrowed down to a Dask-ML reproducer? If so, I would assume a fix would make sense there as generally Dask APIs shouldn't run into issues if a dataframe contains empty partitions

Yup. The hope is that i can push a fix for this in Dask-ML . If not then fallback to a fix here. Will like to keep this issue open for tracking purposes.

@charlesbluca
Copy link
Collaborator

Makes sense to me - feel free to ping this issue with any follow up discussion / PRs on dask-ml

@VibhuJawa
Copy link
Collaborator Author

Started issue dask/dask-ml#911 and PR dask/dask-ml#912 to fix this.

@sarahyurick
Copy link
Collaborator

Can we close this issue since we've eliminated all Dask-ML dependencies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working machine learning Improvements or issues with machine learning functionality
Projects
None yet
Development

No branches or pull requests

3 participants