Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with indexing data.tables passed to future_map_* #182

Closed
thiesben opened this issue Nov 17, 2020 · 1 comment
Closed

Issue with indexing data.tables passed to future_map_* #182

thiesben opened this issue Nov 17, 2020 · 1 comment

Comments

@thiesben
Copy link

thiesben commented Nov 17, 2020

I've come across a bug when working with data.tables and furrr. Check out this reprex:

library(data.table)
library(furrr)

fun <- function(one, two){
  print(two)
  print(class(two)) # data.table data.frame
  # print(two[,a]) # <- Uncomment for "Error in `[.data.frame`(two, , a) : object 'a' not found"
  
  print(one)
  print(class(one))  # data.table data.frame
  # print(one[,y]) # same here
  
  # Now this:
  setDT(one)
  print(class(one)) # data.table data.frame
  print(one[,y]) # Prints correctly, no error!
  
  return(NULL)
}

df <- data.frame(x = c(1,2), y = c(1.2,3.4))
dt <- setDT(df)
dt[,y] # 1.2 3.4

input <- list(data.table(a = c(4211, 815)), data.table(a = c(007, 101)))

plan(multisession, workers = 2)
future_map(input, ~fun(dt, .x))

I'm encountering an error saying "Error in '[.data.frame'(two, , a) : object 'a' not found" when trying to access columns in the way done by the function in the example. However, when (redundantly!) calling setDT in the function, it works without problems. I really don't know where to address this, the behaviour is very weird.

Also, this does not only affect indexing with data.tables, but also filtering etc.

@thiesben thiesben changed the title Issue with indexing data.tables passed to future_map_ Issue with indexing data.tables passed to future_map_* Nov 17, 2020
@DavisVaughan
Copy link
Collaborator

The issue here is the same as futureverse/globals#46 and won't be fixed by furrr.

The problem is that the underlying {globals} package that looks for globals and packages to "export" to your workers can't find anything that is specific to data table...until you call setDT(). It isn't the act of "setting" the object as a data table that fixes things. It's just the fact that that function is there, so now globals sees that data.table is a required package for that function to run.

The easiest way to fix this is to require data table to be loaded on the workers with furrr_options(packages = "data.table")

library(data.table)
library(furrr)

# nothing in here is "data.table specific"
fun1 <- function(x) {
  x[,y] 
}

fun2 <- function(x) {
  # do something stupid that clearly requires data table
  data.table(1)
  
  x[,y] 
}

df <- data.frame(x = c(1,2), y = c(1.2,3.4))
dt <- setDT(df)

lst <- list(dt)

plan(multisession, workers = 2)

future_map(lst, fun1)
#> Error in `[.data.frame`(x, , y): object 'y' not found

future_map(lst, fun2)
#> [[1]]
#> [1] 1.2 3.4

future_map(lst, fun1, .options = furrr_options(packages = "data.table"))
#> [[1]]
#> [1] 1.2 3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants