You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've come across a bug when working with data.tables and furrr. Check out this reprex:
library(data.table)
library(furrr)
fun <- function(one, two){
print(two)
print(class(two)) # data.table data.frame
# print(two[,a]) # <- Uncomment for "Error in `[.data.frame`(two, , a) : object 'a' not found"
print(one)
print(class(one)) # data.table data.frame
# print(one[,y]) # same here
# Now this:
setDT(one)
print(class(one)) # data.table data.frame
print(one[,y]) # Prints correctly, no error!
return(NULL)
}
df <- data.frame(x = c(1,2), y = c(1.2,3.4))
dt <- setDT(df)
dt[,y] # 1.2 3.4
input <- list(data.table(a = c(4211, 815)), data.table(a = c(007, 101)))
plan(multisession, workers = 2)
future_map(input, ~fun(dt, .x))
I'm encountering an error saying "Error in '[.data.frame'(two, , a) : object 'a' not found" when trying to access columns in the way done by the function in the example. However, when (redundantly!) calling setDT in the function, it works without problems. I really don't know where to address this, the behaviour is very weird.
Also, this does not only affect indexing with data.tables, but also filtering etc.
The text was updated successfully, but these errors were encountered:
thiesben
changed the title
Issue with indexing data.tables passed to future_map_
Issue with indexing data.tables passed to future_map_*
Nov 17, 2020
The problem is that the underlying {globals} package that looks for globals and packages to "export" to your workers can't find anything that is specific to data table...until you call setDT(). It isn't the act of "setting" the object as a data table that fixes things. It's just the fact that that function is there, so now globals sees that data.table is a required package for that function to run.
The easiest way to fix this is to require data table to be loaded on the workers with furrr_options(packages = "data.table")
library(data.table)
library(furrr)
# nothing in here is "data.table specific"fun1<-function(x) {
x[,y]
}
fun2<-function(x) {
# do something stupid that clearly requires data table
data.table(1)
x[,y]
}
df<-data.frame(x= c(1,2), y= c(1.2,3.4))
dt<- setDT(df)
lst<-list(dt)
plan(multisession, workers=2)
future_map(lst, fun1)
#> Error in `[.data.frame`(x, , y): object 'y' not found
future_map(lst, fun2)
#> [[1]]#> [1] 1.2 3.4
future_map(lst, fun1, .options= furrr_options(packages="data.table"))
#> [[1]]#> [1] 1.2 3.4
I've come across a bug when working with data.tables and furrr. Check out this reprex:
I'm encountering an error saying "Error in '[.data.frame'(two, , a) : object 'a' not found" when trying to access columns in the way done by the function in the example. However, when (redundantly!) calling setDT in the function, it works without problems. I really don't know where to address this, the behaviour is very weird.
Also, this does not only affect indexing with data.tables, but also filtering etc.
The text was updated successfully, but these errors were encountered: