-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict of CARET preProcess not working within furrr future_map #72
Comments
Can you please provide a reproducible example, using the reprex package? I have a feeling the issue is that caret isn't loaded in each worker, because Nevertheless, this doesn't look like it will be very efficient. It looks like you are trying to move the full model and the full data set to each worker, and subset the data set on the worker and then predict and return the predictions. All of that data shuffling is likely going to be more expensive than just doing it normally |
Thanks!
Sure! Not familiar with reprex but will give it a read tonight and try to send you the example in this format tomorrow.
Any tips you might suggest on more efficient ways to speed up the processing on a large dense matrix? (~6.8m rows by ~300 columns)
I am fairly new to R and completely new to datasets this big. Kind of stumbling around now for the best approach.
Been trying to teach myself how to split the dataset by rows into blocks, allocate each block of rows to separate cores and then run the predict.preProcess in parallel on the cores available. Then rebind the rows back together. From the write ups sounded like furrr would be a perfect option. Clearly I need to go back and better understand what the code I have written is doing!
Would you suggest I rework my code in furrr to do this or would another application you might suggest me more appropriate?
I am reading up on the basics of parallel processing now and there seem to be a number of options.
Any tips pointing me in the right direction would be greatly appreciated!!!!
Thanks so much again!
…Sent from my iPhone
On May 30, 2019, at 1:30 PM, Davis Vaughan ***@***.***> wrote:
Can you please provide a reproducible example, using the reprex package?
I have a feeling the issue is that caret isn't loaded in each worker, because predict() on its own isn't enough to know that the predict.preProcess method is being called.
Nevertheless, this doesn't look like it will be very efficient. It looks like you are trying to move the full model and the full data set to each worker, and subset the data set on the worker and then predict and return the predictions. All of that data shuffling is likely going to be more expensive than just doing it normally
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Closed in favor of tracking futureverse/globals#46 |
Having trouble running a function within future_map that runs fine on its own.
This
predict
of a preProcess model (KNNImpute) from Caret works fine on its own.library(caret)
fullNetworkImpute <- predict(missingdata_model, newdata = fullNetwork)
Hoping to use
furrr
as the newdata i need topredict
for is very large and want to parrallelize.Tried to run this...
library(caret)
library(furrr)
plan(multiprocess)
fullNetworkImpute <- future_map(rep(1, 10), ~predict(missingdata_model, newdata = fullNetwork))
and got error message...
Look forward to your suggestions!
Thanks in advance!!!
The text was updated successfully, but these errors were encountered: