Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map_dtc is unreasonably slow when .f returns data.table #78

Open
mb706 opened this issue Sep 22, 2022 · 1 comment
Open

map_dtc is unreasonably slow when .f returns data.table #78

mb706 opened this issue Sep 22, 2022 · 1 comment

Comments

@mb706
Copy link
Contributor

mb706 commented Sep 22, 2022

When the function in map_dtc returns a data.table with many rows, map_dtc appears to be slower than it needs to be by a factor of about 100.

system.time(mlr3misc::map_dtc(1:3, function(x) runif(1e6, max = x)))
#>    user  system elapsed 
#>   0.043   0.000   0.044 
system.time(mlr3misc::map_dtc(1:3, function(x) data.table(x = runif(1e6, max = x))))
#>    user  system elapsed 
#>   5.124   0.006   5.147 

profvis tells me this this is because name_dots is called in data.table.

@m-muecke
Copy link
Member

m-muecke commented Jun 16, 2024

@mb706 I've found the same but on a much smaller scale, but the memory allocation was higher than it should be, this is due to the do.call(data.table, c(cols, list(check.names = TRUE))) in https://github.com/mlr-org/mlr3misc/blob/main/R/purrr_map.R#L129 as a fix I've used the following, i.e. using setDT():

map_dtc = function(.x, .f, ...) {
  cols = map(.x, .f, ...)
  setDT(unlist(cols, recursive = FALSE))[]
}

perhaps we can do something like the following to accomodate both use-cases:

map_dtc = function(.x, .f, ...) {
  cols = map(.x, .f, ...)
  j = map_lgl(cols, function(x) !is.null(dim(x)) && !is.null(colnames(x)))
  names(cols)[j] = ""
  if (inherits(cols[[1L]], "data.table")) {
    cols = unlist(cols, recursive = FALSE)
  }
  setDT(cols)[]
}

There is also PR for a C implementation for cbindlist, but seems to take quite a while till that is merged: Rdatatable/data.table#4370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants