-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/3299 diagnostics chainset #3310
Conversation
Created a new PR because accidentally included some version of Stan math_lib in #3305 and don't have the GitHub chops to undo its fubarness. |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
CmdStan tests failed - correctly! ADVI output format is 1st line is the ADVI estimate, followed by 1000 samples. |
Seems like a good move. Only downside is we lost the review history, which makes it difficult to track what has changed |
agreed - will continue to add in changes suggested/requested in #3305 |
Sounds good! Could you ping me when you think all of the 'old' review is resolved and I can take a look at the branch then? |
…m/stan-dev/stan into feature/3299-diagnostics-chainset
…m/stan-dev/stan into feature/3299-diagnostics-chainset
…m/stan-dev/stan into feature/3299-diagnostics-chainset
…m/stan-dev/stan into feature/3299-diagnostics-chainset
…m/stan-dev/stan into feature/3299-diagnostics-chainset
…m/stan-dev/stan into feature/3299-diagnostics-chainset
I used the current develop branch I’m not saying we’re obligated to report the same numbers from now until the end of days, but I just think if we do make a change we should have a reason why the new number is better/more correct, not just that we like the code that calculates it more |
I have checked the existing unit tests and the behavior on real output files. I think this is OK. |
The quantiles shouldn't change. [edit: this is wrong---see below] But the R-hat and ESS will. That's one of the motivations for the PR---to switch over to our improved versions of these as described in this paper: Vehtari, Gelman, Simpson, Carpenter, Bürkner. 2019. I believe @avehtari has already switched over RStan and PyStan and we're just waiting for this PR to switch over CmdStan and hence CmdStanPy and CmdStanR. I believe arViz also does these new calculations. We could keep our old definitions and report two versions of R-hat (old, old per second, new) and three versions of ESS (old, new bulk, new tail), but I think it'd be confusing for users. |
As another comment, I don't think we should be reporting ESS for sampler parameters other than |
CmdStanR can call CmdStan diagnostics, but CmdStanR is also using
The old ESS which (called
I agree. Same for Rhat. |
My point is, they do. I believe the test cases we are using are not specific enough to see that, but doing a bit of testing on the side:
Here's a CSV file: https://gist.github.com/WardBrian/41bbf1b57829efa32e6aa58c5c493f57 Current stansummary (with --sig_figs=4)
new stansummary (with --sig_figs=4)
0.2429 != 0.2425 CmdStanPy uses Again, we can decide we want to make this change, but it just happening incidentally doesn't sit right with me, hence me not merging as-is. |
It would be better to use terms ESS_bulk and ESS_tail, as 1) N has different meaning in the paper describing these quantities, and 2) N_eff makes people to misread it as "Number of EFFective samples" which is wrong, and the correct term is "Effective Sample Size", 3) CmdStanR using posterior is showing ESS_bulk and ESS_tail. |
@avehtari - could you check the implementations of mcse and mcse_sd? stan/src/stan/mcmc/chainset.hpp Lines 424 to 467 in dbf6a9f
|
Yes, there is an existing cmdstan issue for this stan-dev/cmdstan#916, which @mitzimorris will address as part of the cmdstan pr that accompanies this |
@WardBrian regarding the implementation of quantiles, we need a better understanding how the boost accumulators library computes quantiles - it is returning the value of a the nth draw or interpolating? and which is appropriate? I spent yesterday looking at the boost code and don't understand it well enough to make that call. changing to use C++ std::nth_element would make the code easier to understand and remove dependencies, but as you say, that's not a good reason to change behavoirs. |
I suspect the difference may just be as simple as the existing code prioritizes counting from the left if you're requesting the 50% or higher quantile, but I agree the boost reference is pretty opaque in this area |
There is also |
@@ -117,7 +117,6 @@ class chains { | |||
using boost::accumulators::tag::tail; | |||
using boost::accumulators::tag::tail_quantile; | |||
double M = x.rows(); | |||
// size_t cache_size = std::min(prob, 1-prob)*M + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to change anything in chains.hpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this commented out line after going through the blame - this code has always been there in its commented out form. why is it here? maybe to show one alternative - which would use less memory, run faster, and provide approximate as opposed to exact estimates. I don't know. so I got rid of this comment, and a few other comments that were clearly outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to change anything in chains.hpp?
the one thing that needed to be changed was to remove the rank-normalized r-hat functions, which weren't being called by anything, since this PR changed the function arguments.
I spoke too soon. The problem with quantiles is that reasonable definitions can differ. When you don't have an exact item, a lot of algorithms are going to interpolate, and how you do that can differ. For example if you only have two numbers [1, 3] and ask for a 50% quantile (i.e., a median), what is the answer? You could return 1 or 3, which is what you get if you turn 50% and the size of 2 into an index. Or you could interpolate between 1 and 3 and return 2. The problem's the exact same thing at an edge, but it might not be 50%. What do you do if I ask you for a 40% quantile of [1, 3]? There you might be tempted to linearly interpolate and say 1.8. That interpolation can often be made more stable if you use more elements around an element, so if we have four elements [2, 3, 5, 10], we might return a different answer than [3, 5] for the 50% quantile. It's a mess! The bottom line, though, is that in these situations, we really don't care. The sampling algorithm didn't have the granularity to cleanly resolve the quantile so if we use one of the neightboring values or interpolate, it should be fine for our use.
See above. That should also be fine. Using a |
Calling Eigen::MatrixXd draws = samples(index);
Eigen::Map<Eigen::VectorXd> map(draws.data(), draws.size());
return stan::math::quantile(map, prob); There is an overload for if |
plugged most recent version of this branch into CmdStan and comparing CmdStanR's summary (via fit$print) against CmdStan's bin/stansummary on the test CSV files in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I had a few small things I noticed going through but otherwise I think this is good
#include <stan/math/prim/fun/constants.hpp> | ||
#include <stan/math/prim/fun/quantile.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These two includes shouldn't be necessary if we're also including math/prim above
*/ | ||
double variance(const int index) const { | ||
Eigen::MatrixXd draws = samples(index); | ||
return (draws.array() - draws.mean()).square().sum() / (draws.size() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stan::math::variance
*/ | ||
double quantile(const int index, const double prob) const { | ||
// Ensure the probability is within [0, 1] | ||
if (prob <= 0.0 || prob >= 1.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can allow the endpoints using math's quantile
. It also throws its own error if necessary, so we could just skip this entirely now.
if (probs.minCoeff() <= 0.0 || probs.maxCoeff() >= 1.0) { | ||
throw std::out_of_range("Probabilities must be between 0 and 1."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as above
double ess_bulk = analyze::split_rank_normalized_ess(samples(index)).first; | ||
return sd(index) / std::sqrt(ess_bulk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for @avehtari from above about which ESS this should use:
The old ESS which (called ess_basic() in posterior package) is still needed for computing MCSE for mean estimate.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mcse_mean should use ess_mean, not ess_bulk
in the same way, mcse_sd should not use the rank normalized ess
* @param index parameter index | ||
* @return pair (bulk_rhat, tail_rhat) | ||
*/ | ||
std::pair<double, double> split_rank_normalized_rhat(const int index) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I still think it is worth adding functions for the "old" rhat and ess to this class, if they'd be this same size)
comments look great - thanks! per discussion with @SteveBronder, I think the way to proceed here is with 3 PRs:
|
I agree that would have been a better (or at least probably faster) way to go originally, but this being as far along as it is it would also be fine by me to proceed with this one big PR. Whichever you feel like, at this point |
I think 3 PRs would be better - a lot of work, but the grown-up thing to do here. (especially since I've been advocating pragmatic programming lately). |
These should use ess_mean without rank normalization. The rank normalized versions are generic diagnostics that work also with infinite variance, but by definition of MCSE mcse_mean and mcse_sd should not use rank-normalization. |
closing this per above comment - #3310 (comment) |
Submission Checklist
./runTests.py src/test/unit
make cpplint
Summary
This PR uses the improved R-hat diagnostics (#3266) and extends rank-normalization to the computation of ESS-bulk and ESS-tail. These diagnostics are computed on samples from one or more chains, which entails parsing the sample out of a stan-csv file and assembling it into an Eigen MatrixXd object.
The work done for this PR is:
stan/analyze/mcmc
so that both ESS and Rhat can use rank-normalizationstan/mcmc/chainset.hpp
which stores a set of samples as a std::vector of Eigen MatrixXds.stan_csv_reader
to fix bug stan_csv_reader should skip warmup draws #3301, and to distinguish between ADVI and NUT-HMC samples (because CmdStan PR Updated stansummary to not quit on non-fatal error (#971) cmdstan#972 was done in order to get summaries from ADVI outputs).Intended Effect
The consumer of this functionality is CmdStan's
stansummary
utility, which, given a list of Stan CSV filenames, assembles them into a chainset object and then calls functions on the chainset object to get summary stats and diagnostics.How to Verify
unit tests in this repo; unit tests in CmdStan repo test end-to-end parsing and reporting
Side Effects
N/A
Documentation
Documentation for CmdStan bin/stansummary to be done in docs repo, and CmdStanPy docs (CmdStanPy's
summary
function wraps CmdStan'sstansummary
).Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: