-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to cache compiled stan-models and re-use them across sessions #304
Comments
cmdstanpy lets you instantiate a CmdStanModel from just an exe file - also has logic to check timestamps on stan and exe file and if the latter is newer, doesn't compile. CmdStanR should have similar behaviors. |
Hi, thanks for the request. We currently do not have a way to create a CmdStanModel just from an executable. It might be reasonable to add that. However, the executable should not recompile every session. Can you maybe provide an example of how you call cmdstan_model? If I run
close the session, restart RStudio or the PC, it does not recompile every time. |
We compare the modified time of the .stan file and the executable and only recompile if the stan file was modified after the executable was created. |
Yeah like @rok-cesnovar said I'm not sure that it's necessary to be able to create a CmdStanModel object from an executable if @mikekaminsky is that not working properly for you? If it's not then we definitely need to fix something. |
Ah, fascinating! We are doing something hacky where we're combining different text files in R to make the Stan program on the fly. I'm not sure if there's a reason we did it that way instead of using However, given you're looking at file modified time, I think we might run into issues when we re-install the package. It's an internally-used package where the R code changes pretty frequently, although the underlying Stan models do not. I haven't tested that yet, but that will be next on my list. |
Oh ok, yeah that could be an issue. Definitely let us know if you run into trouble. I'm not sure, but it might be that we will just need to do what @mitzimorris suggested all along and provide a direct way to create the CmdStan model object instead of relying on when the file was modified. We can definitely do that if necessary. |
Let me play with this some more today and tomorrow and I will report back. Thanks for the quick turnaround on the questions here! |
Couldn't the content of the file be hashed and we just compare hashes? Seems at least better than modification times. |
@mike-lawrence Yeah I like that idea. @rok-cesnovar or @mitzimorris any reason you can think of why we shouldn't do this by comparing hashes? |
That is a great idea. but for that to work seamlessly, the executable should be able to return the hash of the Stan model. Otherwise we need to store it and thar becomes a mess with sessions etc. That could be easily added as part of stan-dev/cmdstan#887 |
Can the hash be in the name of the executable for now? (I don't know where that's stored across sessions) |
That is also a temporary possibility. One slight issue there is how we clean up executables, because 10 changes and compilations would create 10 executables in the output folder. This cab be worked around, we just need to be careful. |
I presume cmdstanr currently assumes a standard filename since as described earlier here it's able to check the modification times of previously compiled files. Now it would use that same standardized filename as a basename and so long as appending the hash is standardized, it should find the old file |
FWIW mike-lawrence's suggestion is exactly what I have pseudo-implemented in my package for doing this, so seems reasonable to me! |
I implemented the hash-suffixed executables and made a pull request, but see now that the test coverage is so good that there are new failing tests with new scheme. Neat! (I'm new to tests, PRs, etc) I have to shift gears for tonight, but if anyone wants to take a look at what I did in the PR (esp. my regex stuff before deleting old exes; very important to get that right and not commit unintended deletions!) or work on updating the tests, go for it. I'll notify in the PR thread when I return to this (probably on the weekend) so I don't overlap with anyone else working on the tests. |
FWIW, I totally agree that models w/out corresponding source code are a very dangerous thing. it's just the that there's a use case for folks building the kind of system described above - CmdStanPy request came from Prophet folks - stan-dev/cmdstanpy#70 |
Some follow-ups from my side here: I was able to get cmdstanr caching working as expected once I switched to compiling from the stanfiles instead of stan code I was assembling on-the-fly in R (maybe this could be documented better?). However, as we suspected, the cached stan files do not persist after re-installing the package from a different commit. I think @mike-lawrence's solution will address this as long as we use the |
@mike-lawrence thanks for the PR! @mikekaminsky Thanks a lot for following up, that's good to know. I think you're right that @mike-lawrence's solution would work in your case. |
Hello everyone! I found another issue related to caching so figured I'd add it here, but let me know if I should open another issue. This might also be addressed in the PRs mentioned above -- I took a look and I think it maybe is, in which case we can note this when we describe the new caching strategy. The Issue:It seems that the caching is wonky when using As a user, I would expect the cache to be based on the contents of the entire program inclusive of included files, so if either I think we're moving in that direction with #306, but wanted to mention it in case this isn't handled and is easy to add or if it's a surprise benefit :)
|
Yeah, this is one shortcoming on relying on modified time of only the main stan file. Good call! cmdstanr does not know which files were included in the Stan model or their locatin. The only way of realistically getting around that is by relying on hashes of the generated C++ code, not auto-format. There are other options but they are on the hacky-side. |
if the Stanc3 parser has a way to identify the full path to the include files, perhaps we could |
One thought I just had was to write the hash of the auto-formatted code as a comment at the end of the stan file itself, which in turn provides a simple mechanism for forcing a re-compile (user can simply delete that line) for the scenario @mikekaminsky describes (where the main stan file is unchanged but an |
This issue isnt solved but we have a few of them open on this topic. Lets discuss this further in #423 Closing this. |
Is your feature request related to a problem? Please describe.
I'd like to be able to save the compilation time for models that we run over-and-over-again across R sessions.
Describe the solution you'd like
Ideally, we could cache those executables somewhere and skip compilation if the executable has always been compiled once. I can see the paths to the executables and header files via
model$exe_file()
andmodel$hpp_file()
, but what I don't see is a way to create aCmdStanModel
object using an existing executable / header file. If this is already possible and I just missed it in the docs, I apologize!Describe alternatives you've considered
I'm basically trying to do a small part of what rstantools does for rstan -- for a variety of reasons, our project doesn't work with rstan and so we are using cmdstanr instead -- I've got the shell of the code working for doing the caching, just need a way to plug the saved executables back into the code we're using (creating a
CmdStanModel
object).The text was updated successfully, but these errors were encountered: