-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompile only on changes to output of stanc3 auto-formatter (feature request & design discussion) #423
Comments
Oh! just thought I'd note that I just encountered a "bug" in the released version of how we do things. I asked for a model to compile, then immediately made a change to the stan file, saving before the compilation finished. I then tried to recompile and clearly the save time of the exe was still later than the save time of the stanfile, so it claimed that the exe was up to date. Low-probability edge case obviously, and one that an aware user can get around by re-saving or using the |
Hey @rok-cesnovar / @mike-lawrence , Just to say we have run into this issue today. Our scenario is that we are generating and running stan models as part of our CICD unit tests. In order to get the run times down we were hoping to cache the compiled exes (whilst compiling isn't long it does add up with compiling multiple models). However the way our package works is that we generate our stan code on the fly and then compile it as such:
Under this current setup the models are forced to be recompiled in every new session which kinda completely defeats the point of using a cache directory for CICD builds. Are there any potential work arounds ? If not is there anyway I could convince you guys to add some priority to this as it would be a really nice QOL update (at least for developers) |
@gowerc For your use case, there is a workaround. Note that I'll also note that if we agree on implementing stan-dev/cmdstan#1178 then the executable itself would contain some information about the code making implementation of this feature easier (though there is still some discussion to be made about that issues). |
See how aria uses a local for and hashing of the stanc outputs to avoid unnecessary recompilation. Edit: I initially had “In the interim” at the beginning of this comment, but had missed that a solution does exist (per Martin’s response). I think however there’s a difference between what made it into cmdstanr and what aria does whereby aria skips recompile when comments and whitespace change (thanks to passing the code through stanc before hashing), while I think cmdstanr will recompile under those conditions. |
@mike-lawrence & @martinmodrak , First off thank you for the suggested workaround; seems to be working as expected ! Apologies though I am confused as to what the outstanding issue is here with regards to this ticket.
Based on (1) I wouldn't have expected the proposed workaround to work as the unit tests will always be run in a new session ? |
I think the description is inaccurate - cmdstanr compares the timestamps of the Stan file and the executable, this has nothing to do with the session. If you are using However, this fails if the Stan file was deleted in the meantime (e.g. because it was in the temp directory as in your case) and it also fails if a change is made to the file, but the change is then reverted (which changes timestamp) or if a whitespace only/comment only change is made. The proposal is to store enough information to be able to detect the need for recompilation without relying on timestamps. This could be external (as Mike suggests) or it could be stored internally in the executable (as stan-dev/cmdstan#1178) suggests. Storing in the executable is IMHO more preferable as you don't have to worry that the storage becomes out of sync with your executables, but the downside is that this requires changes to cmdstan. |
(While an issue already exists related to this, as well as a PR, I thought it would be cleaner to start a new issue to discuss things now that I've had time to work on it and have learned about the option space.)
At present, when an exe already exists, cmdstanr will recompile if either:
1 makes for annoyance when one is coming back to a project and starting a new session (but also for packages that bundle exes, as evidenced by the original issue that spawned all this), and 2 is an annoyance when the stanfile has changed but only in non-functional ways (touched, whitespace, comments, etc)
Granted the Stan compiler has become very fast nowadays, so the above annoyances are in turn diminishing in magnitude. However, I think it's similarly trivial to solve them in a way that has little chance of going awry.
Specifically, I propose that any compilation of the model start with passing it through the stanc3 auto-formatter, yielding a character string that is then hashed to a digest. A record of this hash is then kept somewhere permanent across sessions, and only when this record is absent or changes do we recompile (and obvs "re"compile if there's no exe).
There was some back-and-forth on where to store this digest previously; at first I thought it made sense to append it to the name of the exe itself and still think that's a fairly bulletproof approach, but this had the downside of having some potential issues related to detecting and deleting outdated exes, plus aesthetic issues with exes having longer more obscure names. An alternative was suggested whereby the hash is stored in a file located in a directory, and where some other recent FRs suggest implementations that involve use of a
stan_scratch
folder, we could use this directory for storing this file too. I came up with a third option whereby the hash is written back to the stan file itself as a comment at the end, but it occurs to me that this does not cover the case where the user is writing code for a stan model as a string in an R script. So I think use of thestan_scratch
folder would be best after all. (n.b. said folder is involved in the proposed implementations of these FRs as well: Background/asynchronous sampling, During-sampling diagnostics )Note that it has been reported that it would be desirable for the hash to recompute on changes to any
#include
files used in the stan file. To achieve this, I imagine it would be pretty straightforward to parse the stan file for includes, confirm they refer to files and hash the contents of those files too. But this might be best left for a separate later feature.The text was updated successfully, but these errors were encountered: