You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose to add optional computation of diagnostics during sampling.
To achieve this, I propose to read the csv files to:
(1) track the proportion of treedepth exceeded
(2) track whether any post-warmup divergences were encountered
(3) track the Bulk & Tail ESS of parameters (with option to specify which to include/exclude)
(4) track the Rhat of parameters (with option to specify which to include/exclude)
To enable efficient incremental parsing of the CSV files, I propose keeping track of how many lines have been read so far and skipping that many lines the next time a read is triggered, storing new samples together with prior samples in an object kept in memory.
To enable resuming this monitoring across R sessions, we could either start the csv parsing from scratch, or we could be writing the contents to a faster binary format (I'm thinking NetCDF) from the outset. This latter has the benefit of leaving the Stan output in a much better format than CSV. If we opted for this, I propose storing both the CSVs and NetCDF fils in a stan_scratch folder (n.b. said folder is involved in the proposed implementations of these FRs as well: Background/asynchronous sampling, Recompile only on changes to output of stanc3 auto-formatter )
The text was updated successfully, but these errors were encountered:
As ESS and Rhat are computed for scalar variables the computation cost during sampling can be significant for models with a lare number of parameters. The issue is complicated if ESS and Rhat would be computed also for generated quantities. One possibility would be by default only examine lp__ and by option give possibility to examine other quantities.
Bulk and Tail-ESS and rank-normalized Rhat don't have sequential computation rule which would mean increasing computation time with the number of iterations. For non-rank-normalized Rhat there would be possibility for sequential estimate.
Good call that the compute may be expected to get unweildy, so it should certainly be something that the user opts-in to rather than being on by default.
I think the question of whether this should be in cmdstanr versus cmdstan is an interesting orthogonal topic. My bias to have it in cmdstanr rather than cmdstan comes purely from the consideration that this is something I have the skill to implement in the former but not the latter.
I propose to add optional computation of diagnostics during sampling.
To achieve this, I propose to read the csv files to:
(1) track the proportion of treedepth exceeded
(2) track whether any post-warmup divergences were encountered
(3) track the Bulk & Tail ESS of parameters (with option to specify which to include/exclude)
(4) track the Rhat of parameters (with option to specify which to include/exclude)
To enable efficient incremental parsing of the CSV files, I propose keeping track of how many lines have been read so far and skipping that many lines the next time a read is triggered, storing new samples together with prior samples in an object kept in memory.
To enable resuming this monitoring across R sessions, we could either start the csv parsing from scratch, or we could be writing the contents to a faster binary format (I'm thinking NetCDF) from the outset. This latter has the benefit of leaving the Stan output in a much better format than CSV. If we opted for this, I propose storing both the CSVs and NetCDF fils in a
stan_scratch
folder (n.b. said folder is involved in the proposed implementations of these FRs as well: Background/asynchronous sampling, Recompile only on changes to output of stanc3 auto-formatter )The text was updated successfully, but these errors were encountered: