During-sampling diagnostics (feature request & design discussion) #425

mike-lawrence · 2021-01-02T22:04:04Z

I propose to add optional computation of diagnostics during sampling.

To achieve this, I propose to read the csv files to:
(1) track the proportion of treedepth exceeded
(2) track whether any post-warmup divergences were encountered
(3) track the Bulk & Tail ESS of parameters (with option to specify which to include/exclude)
(4) track the Rhat of parameters (with option to specify which to include/exclude)

To enable efficient incremental parsing of the CSV files, I propose keeping track of how many lines have been read so far and skipping that many lines the next time a read is triggered, storing new samples together with prior samples in an object kept in memory.

To enable resuming this monitoring across R sessions, we could either start the csv parsing from scratch, or we could be writing the contents to a faster binary format (I'm thinking NetCDF) from the outset. This latter has the benefit of leaving the Stan output in a much better format than CSV. If we opted for this, I propose storing both the CSVs and NetCDF fils in a stan_scratch folder (n.b. said folder is involved in the proposed implementations of these FRs as well: Background/asynchronous sampling, Recompile only on changes to output of stanc3 auto-formatter )

The text was updated successfully, but these errors were encountered:

avehtari · 2021-02-18T11:21:12Z

This looks more like an cmdstan issue.
As ESS and Rhat are computed for scalar variables the computation cost during sampling can be significant for models with a lare number of parameters. The issue is complicated if ESS and Rhat would be computed also for generated quantities. One possibility would be by default only examine lp__ and by option give possibility to examine other quantities.
Bulk and Tail-ESS and rank-normalized Rhat don't have sequential computation rule which would mean increasing computation time with the number of iterations. For non-rank-normalized Rhat there would be possibility for sequential estimate.

mike-lawrence · 2021-02-18T15:33:37Z

Good call that the compute may be expected to get unweildy, so it should certainly be something that the user opts-in to rather than being on by default.

I think the question of whether this should be in cmdstanr versus cmdstan is an interesting orthogonal topic. My bias to have it in cmdstanr rather than cmdstan comes purely from the consideration that this is something I have the skill to implement in the former but not the latter.

mike-lawrence added the feature New feature or request label Jan 2, 2021

rok-cesnovar added this to the future milestone Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During-sampling diagnostics (feature request & design discussion) #425

During-sampling diagnostics (feature request & design discussion) #425

mike-lawrence commented Jan 2, 2021 •

edited

Loading

avehtari commented Feb 18, 2021

mike-lawrence commented Feb 18, 2021

During-sampling diagnostics (feature request & design discussion) #425

During-sampling diagnostics (feature request & design discussion) #425

Comments

mike-lawrence commented Jan 2, 2021 • edited Loading

avehtari commented Feb 18, 2021

mike-lawrence commented Feb 18, 2021

mike-lawrence commented Jan 2, 2021 •

edited

Loading