Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequential works but multi-process fails #73

Closed
leungi opened this issue May 30, 2019 · 7 comments
Closed

sequential works but multi-process fails #73

leungi opened this issue May 30, 2019 · 7 comments

Comments

@leungi
Copy link

leungi commented May 30, 2019

As per subject, reprex below.

library(data.table)
#> Warning: package 'data.table' was built under R version 3.5.3
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.2
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'readr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3
#> Warning: package 'stringr' was built under R version 3.5.3
library(furrr)
#> Loading required package: future
#> Warning: package 'future' was built under R version 3.5.3
dt_cars <- data.table(mtcars)

# sequential works
data.frame(a = 1:6, b = 1:6, c = 1:6) %>%
  nest(b, c) %>% 
  mutate(test = furrr::future_map(data,
                                  function(x) {
                                    dt_cars[carb %in% x$b][, lapply(.SD, sum, na.rm = TRUE),
                                                           by = list(cyl),
                                                           .SDcols = c("drat", "wt")]
                                  }))
#> # A tibble: 6 x 3
#>       a data             test                
#>   <int> <list>           <list>              
#> 1     1 <tibble [1 x 2]> <data.table [2 x 3]>
#> 2     2 <tibble [1 x 2]> <data.table [2 x 3]>
#> 3     3 <tibble [1 x 2]> <data.table [1 x 3]>
#> 4     4 <tibble [1 x 2]> <data.table [2 x 3]>
#> 5     5 <tibble [1 x 2]> <data.table [0 x 3]>
#> 6     6 <tibble [1 x 2]> <data.table [1 x 3]>

# multiprocess fails
plan(multiprocess, workers = 2)

data.frame(a = 1:6, b = 1:6, c = 1:6) %>%
  nest(b, c) %>% 
  mutate(test = furrr::future_map(data,
                    function(x) {
                      dt_cars[carb %in% x$b][, lapply(.SD, sum, na.rm = TRUE),
                                              by = list(cyl),
                                              .SDcols = c("drat", "wt")]
                    }))
#> Error in carb %in% x$b: object 'carb' not found

sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] furrr_0.1.0       future_1.12.0     forcats_0.3.0    
#>  [4] stringr_1.4.0     dplyr_0.8.0.1     purrr_0.3.2      
#>  [7] readr_1.3.1       tidyr_0.8.3       tibble_2.1.1     
#> [10] ggplot2_3.1.0     tidyverse_1.2.1   data.table_1.12.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_0.2.5 xfun_0.6         listenv_0.7.0    haven_1.1.2     
#>  [5] lattice_0.20-38  colorspace_1.4-0 htmltools_0.3.6  yaml_2.2.0      
#>  [9] utf8_1.1.4       rlang_0.3.3      pillar_1.3.1     glue_1.3.0      
#> [13] withr_2.1.2      modelr_0.1.4     readxl_1.3.1     plyr_1.8.4      
#> [17] munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2     
#> [21] codetools_0.2-15 evaluate_0.13    knitr_1.22       parallel_3.5.1  
#> [25] fansi_0.4.0      highr_0.7        broom_0.5.0      Rcpp_1.0.0      
#> [29] scales_1.0.0     backports_1.1.3  jsonlite_1.6     hms_0.4.2       
#> [33] digest_0.6.18    stringi_1.2.4    grid_3.5.1       cli_1.0.1       
#> [37] tools_3.5.1      magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4    
#> [41] pkgconfig_2.0.2  xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.0
#> [45] rmarkdown_1.12   httr_1.4.0       R6_2.3.0         globals_0.12.4  
#> [49] nlme_3.1-137     compiler_3.5.1

Created on 2019-05-30 by the reprex package (v0.2.1)

@DavisVaughan
Copy link
Collaborator

My guess is that this is happening because there are not any data table specific calls in the future_map() function. If you library(data.table) explicitly I assume it would work.

@PedramNavid
Copy link

above is correct. I get the same issue with caret sometimes. another option is to use plan(multiprocess(workers=2, packages='data.table'))

@DavisVaughan
Copy link
Collaborator

yea i've opened an issue with the underlying globals package to see if Henrik has any thoughts on creating a more general solution

@DavisVaughan
Copy link
Collaborator

Closed in favor of tracking futureverse/globals#46

@leungi
Copy link
Author

leungi commented Jun 12, 2019

@DavisVaughan: apologies for delayed update.

Finally got the chance to verify the suggestions.

Based on @HenrikBengtsson documentation, I was under the impression that future takes care of all the necessary export, but glad to see this is a known issue.

Thanks for closing ticket for me.

@DavisVaughan
Copy link
Collaborator

this issue is actually already known and documented in "Missing package (false negatives)" in this vignette. https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html

I wasn't aware of that either, but henrik showed me.

@leungi
Copy link
Author

leungi commented Jun 12, 2019

Noted; 🤦‍♂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants