Changed variational output in 2.28.0 #1049

WardBrian · 2021-10-06T13:59:16Z

Summary:

A CmdStanPy unit test is now failing

Reproducible Steps:

transformed data {
    vector[2]   y[2];
    vector[2]   mu0;
    matrix[2,2] sigma;

    y[1,1]  = 40000.0;
    y[1,2]  = 30000.0;

    y[2,1]  = 30000.0;
    y[2,2]  = 20000.0;

    mu0[1]  = 0.0;
    mu0[2]  = 0.0;

    sigma[1,1] = 10000.0;
    sigma[1,2] = 00000.0;
    sigma[2,1] = 00000.0;
    sigma[2,2] = 10000.0;
}
parameters {
    vector[2] mu;
}
model {
    mu ~ multi_normal(mu0,sigma);

    for (n in 1:2)
        y[n] ~ multi_normal(mu,sigma);
}

The test is here:
https://github.com/stan-dev/cmdstanpy/blob/8596085d31be60cf6a24562b7db92bc1aab50eca/test/test_variational.py#L171
relevant info: algorithm='meanfield', seed=12345
The error:

self.assertAlmostEqual(
            variational.variational_params_np[3], 31.0418, places=2
        )
E       AssertionError: 309.196 != 31.0418 within 2 places (278.1542 difference)

@rok-cesnovar - do you know if this would be expected with this version change?

Current Version:

v2.28.0

The text was updated successfully, but these errors were encountered:

rok-cesnovar · 2021-10-06T16:38:47Z

With sampling I get the following (all default args, seed = 12345):

2.27.0:

                 Mean     MCSE   StdDev     5%    50%    95%    N_Eff  N_Eff/s    R_hat

lp__           -66668    0.046     0.92 -66670 -66667 -66667      404    26953     1.00
accept_stat__    0.90  3.8e-03  1.2e-01   0.63   0.94    1.0  1.0e+03  6.9e+04  1.0e+00
stepsize__       0.95      nan  3.8e-15   0.95   0.95   0.95      nan      nan      nan
treedepth__       1.8  1.5e-02  4.4e-01    1.0    2.0    2.0  8.1e+02  5.4e+04  1.0e+00
n_leapfrog__      3.1  4.0e-02  1.3e+00    1.0    3.0    7.0  9.9e+02  6.6e+04  1.0e+00
divergent__      0.00      nan  0.0e+00   0.00   0.00   0.00      nan      nan      nan
energy__        66669  6.7e-02  1.3e+00  66667  66668  66671  4.0e+02  2.7e+04  1.0e+00

mu[1]           23328      1.9       56  23236  23327  23418      909    60632     1.00
mu[2]           16664      1.6       55  16573  16664  16753     1178    78516     1.00

2.28.0:

                 Mean     MCSE   StdDev     5%    50%    95%    N_Eff  N_Eff/s    R_hat

lp__           -66668    0.050     0.97 -66669 -66667 -66667      378    18889      1.0
accept_stat__    0.92  2.6e-03  9.1e-02   0.73   0.96    1.0  1.3e+03  6.3e+04  1.0e+00
stepsize__       0.83      nan  4.8e-15   0.83   0.83   0.83      nan      nan      nan
treedepth__       2.0  1.9e-02  6.1e-01    1.0    2.0    3.0  1.0e+03  5.2e+04  1.0e+00
n_leapfrog__      4.0  6.1e-02  2.0e+00    1.0    3.0    7.0  1.0e+03  5.2e+04  1.0e+00
divergent__      0.00      nan  0.0e+00   0.00   0.00   0.00      nan      nan      nan
energy__        66669  7.5e-02  1.4e+00  66667  66668  66671  3.6e+02  1.8e+04  1.0e+00

mu[1]           23335      2.0       56  23241  23334  23428      745    37257     1.00
mu[2]           16665      2.0       57  16576  16666  16762      840    42023     1.00

This would suggest it looks good?

Diagnose looks like this (seed=12345, init=1):

2.27.0:

TEST GRADIENT MODE

 Log probability=-189994

 param idx           value           model     finite diff           error
         0        0.992205          6.9997          6.9997    -1.71696e-06
         1         -0.1117         5.00003         5.00002     9.98942e-06

2.28.0:

TEST GRADIENT MODE

 Log probability=-190004

 param idx           value           model     finite diff           error
         0       -0.955454         7.00029         7.00029     5.04136e-07
         1        0.528053         4.99984         4.99985    -7.31341e-06

Gradients seems to match, the values do not though, that seems suspect?

rok-cesnovar · 2021-10-06T16:54:28Z

Ok, this seems to be a real issue. When this was first reported I was focused mostly on the gradient which looks fine but missed the value completely.

Will move this issue to Math once I have a reproducible example for it.

rok-cesnovar · 2021-10-06T18:38:25Z

False alarm on this being a bug in Math. Sorry. This is an issue because the default id was changed to 1 in #987

It was previously 0.

Running variational or diagnose with specified seed=12345, init = 1 AND id = 0 I get the exact same results (the entire CSV is the same).

rok-cesnovar · 2021-10-06T18:49:31Z

So the question now is: is this an issue at all? Can we change default argument values between versions (without a major bump)? Do we avoid changing the default and limit this change only for the case of running multi chain with a single executable?

Given that we are doing a 2.28.1 anyways this is the best time to discuss this.

WardBrian · 2021-10-06T18:55:18Z

Why was it changed? If there wasn't a strong motivating reason I would consider this a bug that should be reverted

wds15 · 2021-10-06T19:09:19Z

No bug... fully intentional. Under multi-threaded, multi-chain mode in Stan, we wanted to have the first chain to be chain "1" instead of "0". This aligns it with things counting from 1 onwards in Stan world. You are still allowed to run explicitly with chain 0, but the default starts at 1 now. It's more confusing to users to see chain 0 to 3 when sampling 4 chains...that's what we (including @SteveBronder ) thought.

SteveBronder · 2021-10-06T19:13:55Z

Dug into this and found the convo here between me and @wds15 when we added multi-chain. Essentially this was done for pretty printing purposes so that when we did multichain the iteration printing and csv files would start indexed from 1 i.e.

Chain [1]:
Chain [2]:
Chain [3]:

and output_1.csv, output_2.csv, output_3.csv

I think it matters what we consider reproducible. idt in the past we've guaranteed high precision reproducibility over versions. But usually the result was pretty close as long as you used the same seed. Now users would need to set the same seed and initial chain id (if they are comparing 2.28+ to 2.28-).

WardBrian · 2021-10-06T19:14:40Z

I think it will be more confusing to have the outputs of something with a fixed seed change than it is to count from 0. If we want to keep the behavior we need to be much clearer about the change and the fact that it means all fixed-seed results can change without setting specifically an id.

rok-cesnovar · 2021-10-06T19:18:26Z

Sorry for closing, fat fingers…

Another alternative is to have 1 be a default id whenever we set num_chains > 1. Since that is a new feature there is nothing to compare it with in previous version so there will be no confusion?

wds15 · 2021-10-06T19:23:02Z

I recall that we did change the default chain_id in the past already without anyone complaining. Adding more if's with a different default for a different case... not sure...

WardBrian · 2021-10-06T19:29:45Z

I agree consistency is good. Changing it to 1 is fine, we just need to say something like this in release notes/all announcements:

The default chain id is now 1 instead of 0. This will change the output of any program that uses a set seed for the random number generator unless it is updated to set id=0

rok-cesnovar · 2021-10-06T19:30:42Z

I am good with that.

SteveBronder · 2021-10-06T19:32:04Z

I recall that we did change the default chain_id in the past already without anyone complaining. Adding more if's with a different default for a different case... not sure...

idt we did, at least looking at the history of the hpp file

https://github.com/stan-dev/cmdstan/commits/develop/src/cmdstan/arguments/arg_id.hpp

Though if I remember right chain id has changed in downstream packages.

I agree with Brian that changing it to 1 is fine as long as we doc it in the release. But let's wait to do the 2.28.1 release till after this Thursday's Stan meeting as I'd like to run this by people so it doesn't get missed and make sure we are not doing a big goof

mitzimorris · 2021-10-06T19:32:08Z

cmdstanpy assembles calls to to optimize and variational but since it only runs 1 chain, doesn't specify chain id.
so you're saying that the reason the estimate changed by a factor of 10 is because it started from a different place in the PRNG?

ok, we really need pathfinder!

SteveBronder · 2021-10-06T19:34:00Z

From Rok's comment (#1049 (comment)) that seems to be the case. Do other seeds give other odd answers?

rok-cesnovar · 2021-10-06T19:36:33Z

so you're saying that the estimate changed by a factor of 10 makes sense because it started from a different place in the PRNG?

if I specify id=0 with 2.26.1,2.27.0 and 2.28.0 you get the exact same results (at least up to the 6th decimal point). Same with id=1 and 2.

the results change significantly for advi with a different seed yes. I would says this is because of the model?

mitzimorris · 2021-10-06T19:36:50Z

not sure, but CmdStanPy is running the 'eta_should_be_big.stan' model same as used in the core Stan tests. how is it being used there? is this a perticularly unstable thing the model is trying to estimate?

WardBrian · 2021-10-06T19:37:37Z

This has also made me realize that cmdstanpy doesn't even let you specify the chain id for variational/optimize because it has never mattered before

mitzimorris · 2021-10-06T19:38:31Z

cmdstanpy def needs more logic to take advantage of 2.28. that's up next.

wds15 · 2021-10-06T19:39:04Z

Also: If people do not fix a chain id...how can they expect reproducibility? Some doc in the release notes would be goo, of course.

WardBrian · 2021-10-06T19:41:13Z

I think it's reasonable to assume a value who's default had never changed before wasn't needed for reproducibility. If our own internal tests made such an assumption, some users likely will have as well

mitzimorris · 2021-10-06T19:41:29Z

If people do not fix a chain id...how can they expect reproducibility?

for CmdStan, running a single chain with fixed seed and no id would be reproducible because offset would always be 0.

rok-cesnovar · 2021-10-06T19:42:38Z

how is it being used there? is this a perticularly unstable thing the model is trying to estimate?

the Stan test only checks if eta is big (100 to be exact). It does not check parameter values. It does specify ids and seeds though: https://github.com/stan-dev/stan/blob/develop/src/test/unit/variational/eta_adapt_big_test.cpp

WardBrian · 2021-10-06T19:42:48Z

offset

another good point, are we now skipping some huge portion at the start of the PRNG by default?

rok-cesnovar · 2021-10-06T19:46:41Z

Yeah 2^50: https://github.com/stan-dev/stan/blob/039ed89fbc5d03dcef09ee5dc53259348e1b37d0/src/stan/services/util/create_rng.hpp#L27

WardBrian · 2021-10-06T19:49:33Z

I assume (hope) that rng.discard is O(1)?

rok-cesnovar · 2021-10-06T20:04:38Z

Its O(log2(N))

We are using an rng that combines two multiplicative linear congruential rngs which can discard with O(log2(N)): https://github.com/boostorg/random/blob/a2740d4b30178cb187fabca163e5be7803a577b9/include/boost/random/linear_congruential.hpp#L193

bob-carpenter · 2021-10-06T20:29:51Z

@rok-cesnovar: Thanks for the link. I had thought it was constant, but log2(2^50) is still only 50 simple arithmetic steps, so nothing we'd notice done once per chain.

@wds15 and @mitzimorris: We can't guarantee exact numerical reproducibility from release to release. Little things like improving numerics for a function will lead to differences.

bob-carpenter · 2021-10-07T16:46:34Z

I verified that this model is very stable under MCMC (multiple runs with different seeds agree to 3+ decimal places) but completely unstable for ADVI (multiple runs can be off by more than a factor of 10).

It's known that the ADVI algorithm doesn't work well for non-unit-scale posteriors.

What was the intent of the test and can it be made with something more stable? This doesn't seem like a good model to use to evaluate ADVI.

P.S. I rewrote using declare/define and array/vector/matrix constructors:

transformed data {
  vector[2]   y[2] = { [40000, 30000]', [30000, 20000]' };
  matrix[2, 2] sigma = [ [10000, 0], [0, 10000] ];
  vector[2] mu0 = [0, 0]';
}
parameters {
  vector[2] mu;
}
model {
  mu ~ multi_normal(mu0, sigma);
  y ~ multi_normal(mu, sigma);
}

NUTS run 1

 variable   mean median sd mad     q5    q95 rhat ess_bulk ess_tail
    mu[1]  23333  23333 58  58  23238  23427    1     3352     2791

NUTS run 2

 variable   mean median sd mad     q5    q95 rhat ess_bulk ess_tail
    mu[1]  23333  23334 57  56  23237  23427    1     3765     2467

ADVI run 1

    variable    mean  median  sd mad      q5     q95
 mu[1]          7190    7192  26  25    7147    7232

ADVI run 2

    variable    mean  median   sd  mad      q5     q95
 mu[1]           685     685    1    1     683     687

For run 1, ADVI reported convergence, whereas for run 2 it reported there might be a divergence.

rok-cesnovar closed this as completed Oct 6, 2021

rok-cesnovar reopened this Oct 6, 2021

WardBrian closed this as completed Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed variational output in 2.28.0 #1049

Changed variational output in 2.28.0 #1049

WardBrian commented Oct 6, 2021 •

edited

Loading

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

WardBrian commented Oct 6, 2021

wds15 commented Oct 6, 2021

SteveBronder commented Oct 6, 2021 •

edited

Loading

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

wds15 commented Oct 6, 2021

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

SteveBronder commented Oct 6, 2021

mitzimorris commented Oct 6, 2021 •

edited

Loading

SteveBronder commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

WardBrian commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

wds15 commented Oct 6, 2021

WardBrian commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021 •

edited

Loading

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021 •

edited

Loading

bob-carpenter commented Oct 6, 2021

bob-carpenter commented Oct 7, 2021

Changed variational output in 2.28.0 #1049

Changed variational output in 2.28.0 #1049

Comments

WardBrian commented Oct 6, 2021 • edited Loading

Summary:

Reproducible Steps:

Current Version:

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

WardBrian commented Oct 6, 2021

wds15 commented Oct 6, 2021

SteveBronder commented Oct 6, 2021 • edited Loading

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

wds15 commented Oct 6, 2021

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

SteveBronder commented Oct 6, 2021

mitzimorris commented Oct 6, 2021 • edited Loading

SteveBronder commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

WardBrian commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

wds15 commented Oct 6, 2021

WardBrian commented Oct 6, 2021

mitzimorris commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021 • edited Loading

WardBrian commented Oct 6, 2021

rok-cesnovar commented Oct 6, 2021 • edited Loading

bob-carpenter commented Oct 6, 2021

bob-carpenter commented Oct 7, 2021

WardBrian commented Oct 6, 2021 •

edited

Loading

SteveBronder commented Oct 6, 2021 •

edited

Loading

mitzimorris commented Oct 6, 2021 •

edited

Loading

rok-cesnovar commented Oct 6, 2021 •

edited

Loading

rok-cesnovar commented Oct 6, 2021 •

edited

Loading