Ability to cache compiled stan-models and re-use them across sessions #304

mikekaminsky · 2020-10-01T18:06:03Z

Is your feature request related to a problem? Please describe.
I'd like to be able to save the compilation time for models that we run over-and-over-again across R sessions.

Describe the solution you'd like
Ideally, we could cache those executables somewhere and skip compilation if the executable has always been compiled once. I can see the paths to the executables and header files via model$exe_file() and model$hpp_file(), but what I don't see is a way to create a CmdStanModel object using an existing executable / header file. If this is already possible and I just missed it in the docs, I apologize!

Describe alternatives you've considered
I'm basically trying to do a small part of what rstantools does for rstan -- for a variety of reasons, our project doesn't work with rstan and so we are using cmdstanr instead -- I've got the shell of the code working for doing the caching, just need a way to plug the saved executables back into the code we're using (creating a CmdStanModel object).

The text was updated successfully, but these errors were encountered:

mitzimorris · 2020-10-01T18:54:03Z

cmdstanpy lets you instantiate a CmdStanModel from just an exe file - also has logic to check timestamps on stan and exe file and if the latter is newer, doesn't compile. CmdStanR should have similar behaviors.

rok-cesnovar · 2020-10-01T18:54:06Z

Hi, thanks for the request.

We currently do not have a way to create a CmdStanModel just from an executable. It might be reasonable to add that.

However, the executable should not recompile every session. Can you maybe provide an example of how you call cmdstan_model?

If I run

library(cmdstanr)


model_path <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(model_path)

close the session, restart RStudio or the PC, it does not recompile every time.

rok-cesnovar · 2020-10-01T18:55:40Z

We compare the modified time of the .stan file and the executable and only recompile if the stan file was modified after the executable was created.

jgabry · 2020-10-01T19:24:21Z

Yeah like @rok-cesnovar said I'm not sure that it's necessary to be able to create a CmdStanModel object from an executable if cmdstan_model() can be used without recompiling. That seems like it's just as convenient if not more convenient (because the user doesn't need to worry themselves about the executable or its path and whatnot). Although maybe there's something I'm overlooking.

@mikekaminsky is that not working properly for you? If it's not then we definitely need to fix something.
One possible issue is that if you're using a Stan program that it's a different directory than the existing executable you would need to specify the dir argument when compiling so it finds the executable. Otherwise it defaults to where the Stan program is located.

mikekaminsky · 2020-10-01T19:25:59Z

Ah, fascinating! We are doing something hacky where we're combining different text files in R to make the Stan program on the fly. I'm not sure if there's a reason we did it that way instead of using #includes so this is a good reason to try that out.

However, given you're looking at file modified time, I think we might run into issues when we re-install the package. It's an internally-used package where the R code changes pretty frequently, although the underlying Stan models do not. I haven't tested that yet, but that will be next on my list.

jgabry · 2020-10-01T19:54:13Z

However, given you're looking at file modified time, I think we might run into issues when we re-install the package. It's an internally-used package where the R code changes pretty frequently, although the underlying Stan models do not. I haven't tested that yet, but that will be next on my list.

Oh ok, yeah that could be an issue. Definitely let us know if you run into trouble. I'm not sure, but it might be that we will just need to do what @mitzimorris suggested all along and provide a direct way to create the CmdStan model object instead of relying on when the file was modified. We can definitely do that if necessary.

mikekaminsky · 2020-10-01T20:01:25Z

Let me play with this some more today and tomorrow and I will report back. Thanks for the quick turnaround on the questions here!

mike-lawrence · 2020-10-01T20:03:23Z

Couldn't the content of the file be hashed and we just compare hashes? Seems at least better than modification times.

jgabry · 2020-10-01T20:05:34Z

@mike-lawrence Yeah I like that idea. @rok-cesnovar or @mitzimorris any reason you can think of why we shouldn't do this by comparing hashes?

rok-cesnovar · 2020-10-01T20:23:27Z

That is a great idea. but for that to work seamlessly, the executable should be able to return the hash of the Stan model. Otherwise we need to store it and thar becomes a mess with sessions etc.

That could be easily added as part of stan-dev/cmdstan#887
This issues has staled a bit, awaiting reviews on a Stan PR. Once that is in, will push that forward.

mike-lawrence · 2020-10-01T20:28:45Z

Can the hash be in the name of the executable for now? (I don't know where that's stored across sessions)

rok-cesnovar · 2020-10-01T20:32:53Z

That is also a temporary possibility.

One slight issue there is how we clean up executables, because 10 changes and compilations would create 10 executables in the output folder. This cab be worked around, we just need to be careful.

mike-lawrence · 2020-10-01T21:10:24Z

I presume cmdstanr currently assumes a standard filename since as described earlier here it's able to check the modification times of previously compiled files. Now it would use that same standardized filename as a basename and so long as appending the hash is standardized, it should find the old file basename_oldhash, split off the hash bit, see it's different, delete basename_oldhash and create basename_newhash. Thus no left-over files; or am I missing something?

mikekaminsky · 2020-10-01T21:34:08Z

FWIW mike-lawrence's suggestion is exactly what I have pseudo-implemented in my package for doing this, so seems reasonable to me!

mike-lawrence · 2020-10-01T23:15:28Z

I implemented the hash-suffixed executables and made a pull request, but see now that the test coverage is so good that there are new failing tests with new scheme. Neat! (I'm new to tests, PRs, etc) I have to shift gears for tonight, but if anyone wants to take a look at what I did in the PR (esp. my regex stuff before deleting old exes; very important to get that right and not commit unintended deletions!) or work on updating the tests, go for it. I'll notify in the PR thread when I return to this (probably on the weekend) so I don't overlap with anyone else working on the tests.

mitzimorris · 2020-10-02T00:04:52Z

FWIW, I totally agree that models w/out corresponding source code are a very dangerous thing. it's just the that there's a use case for folks building the kind of system described above - CmdStanPy request came from Prophet folks - stan-dev/cmdstanpy#70

mikekaminsky · 2020-10-02T17:51:48Z

Some follow-ups from my side here: I was able to get cmdstanr caching working as expected once I switched to compiling from the stanfiles instead of stan code I was assembling on-the-fly in R (maybe this could be documented better?).

However, as we suspected, the cached stan files do not persist after re-installing the package from a different commit. I think @mike-lawrence's solution will address this as long as we use the dir argument to cmdstan_model to write the executables to a known central location so that they don't get blown away by package re-install.

jgabry · 2020-10-05T21:22:15Z

@mike-lawrence thanks for the PR!

@mikekaminsky Thanks a lot for following up, that's good to know. I think you're right that @mike-lawrence's solution would work in your case.

mikekaminsky · 2020-12-12T21:06:29Z

Hello everyone! I found another issue related to caching so figured I'd add it here, but let me know if I should open another issue. This might also be addressed in the PRs mentioned above -- I took a look and I think it maybe is, in which case we can note this when we describe the new caching strategy.

The Issue:

It seems that the caching is wonky when using #include. In particular, if I have two stan programs outer.stan and included.stan where outer.stan has the line #include included.stan, then the cache will only be invalidated if outer.stan changes.

As a user, I would expect the cache to be based on the contents of the entire program inclusive of included files, so if either outer.stan or included.stan change, then we'd invalidate the cache and recompile.

I think we're moving in that direction with #306, but wanted to mention it in case this isn't handled and is easy to add or if it's a surprise benefit :)

> packageVersion("cmdstanr")
[1] ‘0.2.2’

rok-cesnovar · 2020-12-12T21:33:17Z

Yeah, this is one shortcoming on relying on modified time of only the main stan file. Good call!
This is something that cmdstan suffers from.

cmdstanr does not know which files were included in the Stan model or their locatin. The only way of realistically getting around that is by relying on hashes of the generated C++ code, not auto-format.

There are other options but they are on the hacky-side.

mitzimorris · 2020-12-12T21:57:34Z

This is something that cmdstan suffers from.

#includes are handled by the stanc compiler, in order for the GNU-make utility to do the right thing,
there would need to exist the kind of functionality that we're using for the c++ dependencies -
file https://github.com/stan-dev/math/blob/develop/make/dependencies has rules to populate
variables DEPTARGETS and DEPFLAGS.

if the Stanc3 parser has a way to identify the full path to the include files, perhaps we could
create similar makefile rules for compiling Stan program files. that would be elegant and
would make things a lot easier for the user.

mike-lawrence · 2021-01-01T23:49:34Z

One thought I just had was to write the hash of the auto-formatted code as a comment at the end of the stan file itself, which in turn provides a simple mechanism for forcing a re-compile (user can simply delete that line) for the scenario @mikekaminsky describes (where the main stan file is unchanged but an included file does change).

rok-cesnovar · 2021-02-01T08:59:14Z

This issue isnt solved but we have a few of them open on this topic. Lets discuss this further in #423 Closing this.

mikekaminsky added the feature New feature or request label Oct 1, 2020

mike-lawrence mentioned this issue Oct 1, 2020

Recompile based on the hashed code instead of file mod time #306

Closed

2 tasks

rok-cesnovar added this to the v1.0.0 - release milestone Oct 17, 2020

jgabry linked a pull request Nov 11, 2020 that will close this issue

Recompile based on the hashed code instead of file mod time #306

Closed

2 tasks

jgabry removed a link to a pull request Nov 11, 2020

Recompile based on the hashed code instead of file mod time #306

Closed

2 tasks

rok-cesnovar closed this as completed Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to cache compiled stan-models and re-use them across sessions #304

Ability to cache compiled stan-models and re-use them across sessions #304

mikekaminsky commented Oct 1, 2020

mitzimorris commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

jgabry commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

jgabry commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

jgabry commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020 •

edited

Loading

rok-cesnovar commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

mitzimorris commented Oct 2, 2020

mikekaminsky commented Oct 2, 2020

jgabry commented Oct 5, 2020

mikekaminsky commented Dec 12, 2020

rok-cesnovar commented Dec 12, 2020 •

edited

Loading

mitzimorris commented Dec 12, 2020

mike-lawrence commented Jan 1, 2021 •

edited

Loading

rok-cesnovar commented Feb 1, 2021

Ability to cache compiled stan-models and re-use them across sessions #304

Ability to cache compiled stan-models and re-use them across sessions #304

Comments

mikekaminsky commented Oct 1, 2020

mitzimorris commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

jgabry commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

jgabry commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

jgabry commented Oct 1, 2020

rok-cesnovar commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020 • edited Loading

rok-cesnovar commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

mikekaminsky commented Oct 1, 2020

mike-lawrence commented Oct 1, 2020

mitzimorris commented Oct 2, 2020

mikekaminsky commented Oct 2, 2020

jgabry commented Oct 5, 2020

mikekaminsky commented Dec 12, 2020

The Issue:

rok-cesnovar commented Dec 12, 2020 • edited Loading

mitzimorris commented Dec 12, 2020

mike-lawrence commented Jan 1, 2021 • edited Loading

rok-cesnovar commented Feb 1, 2021

mike-lawrence commented Oct 1, 2020 •

edited

Loading

rok-cesnovar commented Dec 12, 2020 •

edited

Loading

mike-lawrence commented Jan 1, 2021 •

edited

Loading