-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
address fit()
slowdown with sparse tibble and formula preprocessor
#246
Comments
Ultimately, |
In order of most to least preferred, my thoughts on approaches we could take for
|
Quick note: if you install the latest dev version of {sparsevctrs} and run |
Notes from chatting with Emil on this: parsnip experiences the same issue with its Approach 1) (with 3) in the long term) is adaptable to both situations and is relatively future-proof. We would raise a condition if users pass sparse data to Users will only pass Emil and I chatted and are on the same page here! I originally proposed in 1) that we would warn in that case. I think a more tidyverse-style-esque approach would be to error and then reference an option (or |
Related to #239—just a place to keep notes on the thought process for supporting sparse tibbles with formula preprocessors. In #245, we see:
Created on 2024-09-13 with reprex v2.1.1
In the formula preprocessor
fit()
evaluation, the data type conversions don't actually take a ton of time:It's just that, with
add_formula()
,parsnip::xgb_train(x)
is a matrix, whereas it's adgCMatrix
when passed withadd_recipe()
, and xgboost is much slower when data that ought to be sparse is dense.The text was updated successfully, but these errors were encountered: