-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed variational output in 2.28.0 #1049
Comments
With sampling I get the following (all default args, seed = 12345): 2.27.0:
2.28.0:
This would suggest it looks good? Diagnose looks like this (seed=12345, init=1): 2.27.0:
2.28.0:
Gradients seems to match, the values do not though, that seems suspect? |
Ok, this seems to be a real issue. When this was first reported I was focused mostly on the gradient which looks fine but missed the value completely. Will move this issue to Math once I have a reproducible example for it. |
False alarm on this being a bug in Math. Sorry. This is an issue because the default id was changed to 1 in #987 It was previously 0. Running variational or diagnose with specified seed=12345, init = 1 AND id = 0 I get the exact same results (the entire CSV is the same). |
So the question now is: is this an issue at all? Can we change default argument values between versions (without a major bump)? Do we avoid changing the default and limit this change only for the case of running multi chain with a single executable? Given that we are doing a 2.28.1 anyways this is the best time to discuss this. |
Why was it changed? If there wasn't a strong motivating reason I would consider this a bug that should be reverted |
No bug... fully intentional. Under multi-threaded, multi-chain mode in Stan, we wanted to have the first chain to be chain "1" instead of "0". This aligns it with things counting from 1 onwards in Stan world. You are still allowed to run explicitly with chain 0, but the default starts at 1 now. It's more confusing to users to see chain 0 to 3 when sampling 4 chains...that's what we (including @SteveBronder ) thought. |
Dug into this and found the convo here between me and @wds15 when we added multi-chain. Essentially this was done for pretty printing purposes so that when we did multichain the iteration printing and csv files would start indexed from 1 i.e.
and I think it matters what we consider reproducible. idt in the past we've guaranteed high precision reproducibility over versions. But usually the result was pretty close as long as you used the same seed. Now users would need to set the same seed and initial chain id (if they are comparing 2.28+ to 2.28-). |
I think it will be more confusing to have the outputs of something with a fixed seed change than it is to count from 0. If we want to keep the behavior we need to be much clearer about the change and the fact that it means all fixed-seed results can change without setting specifically an id. |
Sorry for closing, fat fingers… Another alternative is to have 1 be a default id whenever we set num_chains > 1. Since that is a new feature there is nothing to compare it with in previous version so there will be no confusion? |
I recall that we did change the default chain_id in the past already without anyone complaining. Adding more if's with a different default for a different case... not sure... |
I agree consistency is good. Changing it to 1 is fine, we just need to say something like this in release notes/all announcements:
|
I am good with that. |
idt we did, at least looking at the history of the hpp file https://github.com/stan-dev/cmdstan/commits/develop/src/cmdstan/arguments/arg_id.hpp Though if I remember right chain id has changed in downstream packages. I agree with Brian that changing it to 1 is fine as long as we doc it in the release. But let's wait to do the 2.28.1 release till after this Thursday's Stan meeting as I'd like to run this by people so it doesn't get missed and make sure we are not doing a big goof |
cmdstanpy assembles calls to to optimize and variational but since it only runs 1 chain, doesn't specify chain id. ok, we really need pathfinder! |
From Rok's comment (#1049 (comment)) that seems to be the case. Do other seeds give other odd answers? |
if I specify id=0 with 2.26.1,2.27.0 and 2.28.0 you get the exact same results (at least up to the 6th decimal point). Same with id=1 and 2. the results change significantly for advi with a different seed yes. I would says this is because of the model? |
not sure, but CmdStanPy is running the 'eta_should_be_big.stan' model same as used in the core Stan tests. how is it being used there? is this a perticularly unstable thing the model is trying to estimate? |
This has also made me realize that cmdstanpy doesn't even let you specify the chain id for variational/optimize because it has never mattered before |
cmdstanpy def needs more logic to take advantage of 2.28. that's up next. |
Also: If people do not fix a chain id...how can they expect reproducibility? Some doc in the release notes would be goo, of course. |
I think it's reasonable to assume a value who's default had never changed before wasn't needed for reproducibility. If our own internal tests made such an assumption, some users likely will have as well |
for CmdStan, running a single chain with fixed seed and no id would be reproducible because offset would always be 0. |
the Stan test only checks if eta is big (100 to be exact). It does not check parameter values. It does specify ids and seeds though: https://github.com/stan-dev/stan/blob/develop/src/test/unit/variational/eta_adapt_big_test.cpp |
another good point, are we now skipping some huge portion at the start of the PRNG by default? |
I assume (hope) that rng.discard is O(1)? |
Its O(log2(N)) We are using an rng that combines two multiplicative linear congruential rngs which can discard with O(log2(N)): https://github.com/boostorg/random/blob/a2740d4b30178cb187fabca163e5be7803a577b9/include/boost/random/linear_congruential.hpp#L193 |
@rok-cesnovar: Thanks for the link. I had thought it was constant, but log2(2^50) is still only 50 simple arithmetic steps, so nothing we'd notice done once per chain. @wds15 and @mitzimorris: We can't guarantee exact numerical reproducibility from release to release. Little things like improving numerics for a function will lead to differences. |
I verified that this model is very stable under MCMC (multiple runs with different seeds agree to 3+ decimal places) but completely unstable for ADVI (multiple runs can be off by more than a factor of 10). It's known that the ADVI algorithm doesn't work well for non-unit-scale posteriors. What was the intent of the test and can it be made with something more stable? This doesn't seem like a good model to use to evaluate ADVI. P.S. I rewrote using declare/define and array/vector/matrix constructors:
NUTS run 1
NUTS run 2
ADVI run 1
ADVI run 2
For run 1, ADVI reported convergence, whereas for run 2 it reported there might be a divergence. |
Summary:
A CmdStanPy unit test is now failing
Reproducible Steps:
The test is here:
https://github.com/stan-dev/cmdstanpy/blob/8596085d31be60cf6a24562b7db92bc1aab50eca/test/test_variational.py#L171
relevant info:
algorithm='meanfield', seed=12345
The error:
@rok-cesnovar - do you know if this would be expected with this version change?
Current Version:
v2.28.0
The text was updated successfully, but these errors were encountered: