Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obsolete] Develop sampling output views #21

Closed
wants to merge 14 commits into from
Closed

[Obsolete] Develop sampling output views #21

wants to merge 14 commits into from

Conversation

magland
Copy link
Collaborator

@magland magland commented May 17, 2024

No description provided.

@magland
Copy link
Collaborator Author

magland commented May 17, 2024

Ported over some plots from MCMC Monitor

image

And the table now has a column for Chain and a column for Draw

image

And the summary splits out the means by chain

image

@magland magland requested review from jsoules and WardBrian May 17, 2024 13:59
@WardBrian
Copy link
Collaborator

Some thoughts just looking at it/playing around, before looking at the code:

  • Traceplots are super useful for real-time monitoring like in mcmc-monitor, but are less useful after the fact. This is in part because you lose the sense of "time" - in mcmc-monitor, you can view both how well the chains are mixing and if one of them is significantly behind the others. But in this regime where you're doing analysis after the fact, I believe they're mostly used as a diagnostic, and in my experience usually only after one of the summary statistics like $\hat{R}$ or ESS indicates that there was a problem.
    If you want to show one plot for each variable, the first thing I would start with is a histogram. Then probably a 2-d scatter between pairs of variables, and traceplots only if the user wants them
  • Adding the Chain and Draw columns to the display is nice, but if the idea behind the download is to be able to use other tools in the Stan ecosystem, we should avoid adding any columns that Stan wouldn't have. Creating num_chains different csv files is still probably the best thing on that front
  • I think the mean of all draws is more meaningful than the individual chain means, if you're picking one. Showing it for each chain maybe can give you a sense of if the chains didn't mix, but Rhat is better for that purpose anyway.

Here's the output of Stan's stansummary command, for reference:

Inference for Stan model: bernoulli_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.

Warmup took 0.0080 seconds
Sampling took 0.048 seconds

                Mean     MCSE   StdDev     5%   50%   95%  N_Eff  N_Eff/s  R_hat

lp__            -7.3  3.8e-02     0.76   -8.8  -7.0  -6.7    399     8313   1.00
accept_stat__   0.92  3.5e-03  1.2e-01   0.64  0.97   1.0   1222    25458   1.00
stepsize__      0.96      nan  6.7e-16   0.96  0.96  0.96    nan      nan    nan
treedepth__      1.3  1.6e-02  4.7e-01    1.0   1.0   2.0    874    18204   1.00
n_leapfrog__     2.4  3.6e-02  1.1e+00    1.0   3.0   3.0    926    19286   1.00
divergent__     0.00      nan  0.0e+00   0.00  0.00  0.00    nan      nan    nan
energy__         7.8  5.2e-02  9.7e-01    6.8   7.6   9.8    353     7347   1.00

theta           0.25  6.1e-03     0.12  0.079  0.24  0.48    418     8702    1.0

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

All of these are considering all chains at once, but N_Eff through R_hat need to know which chain the draws originated from to do what they do correctly

@magland
Copy link
Collaborator Author

magland commented May 17, 2024

Okay thanks! I'll work on that.

@magland
Copy link
Collaborator Author

magland commented May 21, 2024

I made some changes based on your comments Brian.

I tried to get the summary table to match the output you pasted.

image

That involved porting code over from MCMC Monitor for calculating ESS Rhat, etc

There is a new "Histograms" tab that shows the parameter histograms. Ported over from MCMC Monitor.

I left in the trace plots tab for now, coming after the histograms tab. I think it's useful to have that available... but I do see what you are saying about how it's more useful for RT monitoring.

Regarding export to CSV. I propose we provide two options. (1) Downloading as single CSV with chain/draw columns (2) Downloading as a .Zip file of CSVs, one for each chain, in the format of Stan output. For the second, I suggest a separate PR. Thoughts?

@jsoules @WardBrian

@WardBrian
Copy link
Collaborator

The summary visually looks great.
The rhat/ess calculations are a bit tricky and require some verification.

For histograms, if you add histnorm: "probability", to the data object, you get normalized histograms, which are a bit more typical in this context compared to the raw counts

@magland magland changed the title Plots tab and separate chains Develop sampling output views May 22, 2024
@magland
Copy link
Collaborator Author

magland commented May 22, 2024

For histograms, if you add histnorm: "probability", to the data object, you get normalized histograms, which are a bit more typical in this context compared to the raw counts

I have added histnorm: "probability"

I have also changed the title for this PR

@magland magland changed the title Develop sampling output views [Obsolete] Develop sampling output views May 31, 2024
@magland
Copy link
Collaborator Author

magland commented May 31, 2024

Replaced by #32

@WardBrian WardBrian closed this May 31, 2024
@WardBrian WardBrian deleted the plots branch June 26, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants