-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thoughts on a cached version of use() ? #41
Comments
Hi. Thank you very much for your suggestion. I guess it would be possible, to implement a caching mechanism for modules. Can you elaborate a bit on the use case? Is this for development, or do you need it as a user? I guess I don't understand why you would rerun use, when in fact your module is already there. Is it because modules are nested? Why are those nested modules taking so long to compile, do they load data, or do they make computations? If caching is really a solution, I see different ways of dealing with it. One way is to use a strategy like memoisation.
Would be very similar to your implementation. Like Like packages, we may need to think about distributing modules as binaries, precompiled, where I would try to convince everyone that in fact we just need a package. |
Use cases include both development (when needing to re generate module after changing the module source code), and as a user, due to nested modules, eg wish to create a module from outer.R where outer.R itself loads modules eg mod_inner <- modules::use('inner.R'). The modules take a while (i'm talking about 4 seconds being considered long, since if this is an inner module nested within layers of outer modules, 4 seconds quickly cascades up.) I did a quick profvis profiling, seems importing of dplyr can take a while, around 1-2 seconds (though I haven't tried on latest versions of dplyr, only tried dplyr_0.8.5). Additionally, I was creating some bizday objects in inner.R that took another 2 seconds. It's difficult to memoise these object creations within inner.R across module In terms of implementation, memoise would be nice, though have to do some minor customisation to include the R script modified time as an input. But yes, i do see that for more general uses of modules package, you have to be a lot more careful with cache invalidation. It's similar to python modules where the only truely safe way of reloading a cached python module is to restart the interpreter. Though do note that even with such clunky cache invalidation, python still always cache its modules. Perhaps this package can have a Restarting R session...
* Project 'C:/Users/User/workspace/Models' loaded. [renv 0.13.2]
* The project may be out of sync -- use `renv::status()` for more details.
>
> tictoc::tic(); suppressPackageStartupMessages(modules::import('dplyr')); tictoc::toc()
2.14 sec elapsed |
@klin333 sorry for not getting back to you. It's quite busy around here and I won't be available to spend any time on this during July, but I would like to pick up again in August and see if we can find a solution. Just some thoughts. In a fresh R session: We see that it takes some time to do an import. However I have never seen 2 sec imports, like in your example.
This is not something we can cache. Even if you do a call to
What we observe here is only the time it takes to load the namespace of dplyr, plus some other things happening in library, but that is negligible. Attaching a namespace is already cached, or only happening once in a rsession, unless you unload them (which we don't do in modules):
So calling this a second time, does not cost much additional resources. That is why I think that we should have a very clear picture of what we really need to cache. Usually these things do cause a headache and tend to get complicated. Anyway, happy to work on this. |
yeah library loading is part of the problem. but also time consuming work done in the scripts themselves, illustrative examples below. these work can be done once and cached, as far as the module is concerned. you can see how with nested modules, the same time consuming cachable work cascades to a very long time, very quickly. eg mod_a.R foo <- function(x) {
# imagine a bunch of operations
1
}
deriv <- Deriv::Deriv(foo, 'x') # imagine some operation takes a while
lookup <- purrr::map(seq(100), function(i) as.Date(paste0(seq(0, 2000), '-01-01'))) # imagine some operations that take a while mod_b.R
mod_c.R
|
anyway, cached modules is super easy if you don't care about separating the function enclosing environments. separating the environments for cached copies of modules is a lot more complicated, but i believe i've done that in my fork. anyway, if too convoluted for the package, it's fine to leave it, i'm perfectly happy using my own fork. |
Hi,
modules::use()
could get quite slow with large R scripts, especially when there are multiple layers of nested modules in modules.Any thoughts on a cached version of
use
that returns the module from a package wide cache?I had a play around here klin333@2036821
The text was updated successfully, but these errors were encountered: