Error regarding dimensions after sampling. #1050

tillahoffmann · 2024-12-12T20:09:37Z

Describe the bug

After the model completes sampling, the following error is raised.

Error in dim(x) <- c(dim(x), 1) : 
  dims [product 150] do not match the length of object [3]

To Reproduce

# Problematic Stan model.
parameters {
    vector [150] f;
}

model {
    f ~ normal(0, 1);
}

> # Calling R code.
> library(cmdstanr)
> library(gptoolsStan)
> 
> cmdstan_model(
+   stan_file = "debug.stan"
+ )$sample(
+   data = list(n = 100, sigma = 1, length_scale = 0.1, period = 1),
+   chains = 1,
+   iter_warmup = 500,
+   iter_sampling = 50
+ )
Compiling Stan program...

[C++ compiler output]

Running MCMC with 1 chain...

Chain 1 Iteration:   1 / 550 [  0%]  (Warmup) 
Chain 1 Iteration: 100 / 550 [ 18%]  (Warmup) 
Chain 1 Iteration: 200 / 550 [ 36%]  (Warmup) 
Chain 1 Iteration: 300 / 550 [ 54%]  (Warmup) 
Chain 1 Iteration: 400 / 550 [ 72%]  (Warmup) 
Chain 1 Iteration: 500 / 550 [ 90%]  (Warmup) 
Chain 1 Iteration: 501 / 550 [ 91%]  (Sampling) 
Chain 1 Iteration: 550 / 550 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Error in dim(x) <- c(dim(x), 1) : 
  dims [product 150] do not match the length of object [3]

Expected behavior

The R code above returns a fit.

Operating system

$ uname -a
Linux 4dcbafc21a34 6.10.14-linuxkit #1 SMP Thu Oct 24 19:28:55 UTC 2024 aarch64 GNU/Linux
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ R --version
R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-unknown-linux-gnu (64-bit)
$ gcc --version
gcc (Debian 12.2.0-14) 12.2.0

I am running this code in a Docker container with the above operating system.

I do not get an error if I run the same code on the host machine with the following configuration.

$ uname -a
Darwin dhcp-10-250-31-164.harvard.edu 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:05:14 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T8103 arm64
$ gcc --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.1.0

The output on the host machine is as follows.

Compiling Stan program...
Running MCMC with 1 chain...

Chain 1 Iteration:   1 / 550 [  0%]  (Warmup) 
Chain 1 Iteration: 100 / 550 [ 18%]  (Warmup) 
Chain 1 Iteration: 200 / 550 [ 36%]  (Warmup) 
Chain 1 Iteration: 300 / 550 [ 54%]  (Warmup) 
Chain 1 Iteration: 400 / 550 [ 72%]  (Warmup) 
Chain 1 Iteration: 500 / 550 [ 90%]  (Warmup) 
Chain 1 Iteration: 501 / 550 [ 91%]  (Sampling) 
Chain 1 Iteration: 550 / 550 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
     lp__ -75.81 -75.93 6.44 4.80 -88.77 -65.62 1.01       26       33
     f[1]   0.02   0.00 1.04 1.04  -1.53   1.48 1.25       71       31
     f[2]  -0.02  -0.09 0.91 0.81  -1.47   1.63 1.00       84       46
     f[3]   0.02  -0.10 1.02 1.11  -1.49   1.65 1.08       84       41
     f[4]  -0.09  -0.27 0.99 0.88  -1.45   1.61 1.02       84       62
     f[5]   0.10   0.34 1.13 1.01  -1.78   1.79 1.00       84       41
     f[6]  -0.13  -0.25 1.02 1.20  -1.61   1.64 1.00       84       37
     f[7]   0.01  -0.02 1.19 1.39  -1.88   1.86 1.00       84       46
     f[8]  -0.10  -0.25 1.12 1.27  -1.74   1.54 1.02       56       33
     f[9]  -0.21  -0.21 1.01 0.80  -2.02   1.90 1.00       84       40

 # showing 10 of 151 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
[...]

CmdStanR version number (same in container and on host)

> packageVersion("cmdstanr")
[1] ‘0.8.1’
> cmdstan_version()
[1] "2.36.0"

Additional context

This problem arose in running the reproduction materials for our Gaussian process inference library after upgrading cmdstan and cmdstanr (cf. onnela-lab/gptools-reproduction-material#4).

I searched GitHub for the code in the error message and found this section. Maybe it's relevant.

https://github.com/stan-dev/posterior/blob/79d4521b943e44f4ac31636c4488d9e2cfeac3ec/R/as_draws_array.R#L227-L239

The text was updated successfully, but these errors were encountered:

jgabry · 2024-12-12T21:28:08Z

Wow, that's very strange! I'm not able to reproduce this on either of my computers (although they're both Macs, just running different OS versions), which is going to make this tricky to debug. Does this happen with all models or just specific ones?

I searched GitHub for the code in the error message and found this section. Maybe it's relevant.

There's a decent chance that line in the posterior package is where the error is coming from, but, if so, I'm not sure why.

Are you able to generate a traceback() so we can see more about where the error is happening?

tillahoffmann · 2024-12-12T21:41:08Z

Thanks for the fast reply! Here's the output of traceback. I have to admit I don't quite know how to interpret it.

12: as_array_matrix_list(x)
11: fun(x, ...)
10: as_draws.default(x)
9: as_draws(x)
8: as_draws_array.default(list(structure(list(treedepth__ = c(4L, 
   4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 
   4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 
   4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 
   4L), divergent__ = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), energy__ = c(160.001, 164.589, 
   149.329, 144.494, 140.444, 154.648, 147.947, 149.639, 141.597, 
   152.719, 143.787, 132.576, 148.491, 160.705, 165.513, 157.754, 
   146.246, 160.25, 160.585, 147.17, 146.789, 170.271, 161.011, 
   144.471, 151.956, 161.701, 165.189, 165.872, 154.899, 175.701, 
   156.912, 153.286, 132.044, 140.624, 136.904, 142.348, 130.973, 
   145.396, 143.553, 155.442, 150.118, 156.114, 156.571, 145.717, 
   146.662, 168.475, 166.723, 159.978, 158.845, 148.17)), row.names = c(NA, 
   -50L), class = "data.frame")))
7: (function (x, ...) 
   {
       UseMethod("as_draws_array")
   })(list(structure(list(treedepth__ = c(4L, 4L, 4L, 4L, 4L, 4L, 
   4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 
   4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
   4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L), divergent__ = c(0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
   0L), energy__ = c(160.001, 164.589, 149.329, 144.494, 140.444, 
   154.648, 147.947, 149.639, 141.597, 152.719, 143.787, 132.576, 
   148.491, 160.705, 165.513, 157.754, 146.246, 160.25, 160.585, 
   147.17, 146.789, 170.271, 161.011, 144.471, 151.956, 161.701, 
   165.189, 165.872, 154.899, 175.701, 156.912, 153.286, 132.044, 
   140.624, 136.904, 142.348, 130.973, 145.396, 143.553, 155.442, 
   150.118, 156.114, 156.571, 145.717, 146.662, 168.475, 166.723, 
   159.978, 158.845, 148.17)), row.names = c(NA, -50L), class = "data.frame")))
6: do.call(as_draws_format, list(post_warmup_sampler_diagnostics))
5: read_cmdstan_csv(files = self$output_files(include_failed = FALSE), 
       variables = variables, sampler_diagnostics = sampler_diagnostics, 
       format = format)
4: private$read_csv_(variables = "", sampler_diagnostics = convert_hmc_diagnostic_names(diagnostics))
3: initialize(...)
2: CmdStanMCMC$new(runset)
1: cmdstanr::cmdstan_model(stan_file = "debug.stan")$sample(data = list(n = 100, 
       sigma = 1, length_scale = 0.1, period = 1), chains = 1, iter_warmup = 500, 
       iter_sampling = 50)

tillahoffmann · 2024-12-12T21:47:39Z

Was able to put together a reproducible example in Docker here: https://gist.github.com/tillahoffmann/ada92a970706c772c6ad2a477ec95fb2

jgabry · 2024-12-12T22:08:16Z

Thanks, that's great. I'll build the image now, but unfortunately the rest of the day and the next week is insanely busy for me so I'm not sure when I'm going to have the time to dig into this if it doesn't turn out to be really simple . I'll try to make some time though!

I think the traceback confirms your suspicion that the error is happening in those lines from posterior, but I'm not sure why yet.

jgabry · 2024-12-12T22:16:11Z

Does this happen with all models or just certain ones?

tillahoffmann · 2024-12-12T22:28:55Z

Sorry, forgot to address that earlier. I don't know if it happens for all models, but it seems to be an issue even for very simple ones like this one.

parameters {
    real f;
}

model {
    f ~ normal(0, 1);
}

The error messsage is as follows.

Running MCMC with 1 chain...

Chain 1 Iteration:   1 / 550 [  0%]  (Warmup) 
Chain 1 Iteration: 100 / 550 [ 18%]  (Warmup) 
Chain 1 Iteration: 200 / 550 [ 36%]  (Warmup) 
Chain 1 Iteration: 300 / 550 [ 54%]  (Warmup) 
Chain 1 Iteration: 400 / 550 [ 72%]  (Warmup) 
Chain 1 Iteration: 500 / 550 [ 90%]  (Warmup) 
Chain 1 Iteration: 501 / 550 [ 91%]  (Sampling) 
Chain 1 Iteration: 550 / 550 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Error in dim(x) <- c(dim(x), 1) : 
  dims [product 150] do not match the length of object [3]

I've just played around with this a bit more, and it only seems to happen if a single chain is run. It works fine for multiple chains.

Edit: It looks like the number 150 is not related to the size of the vector but the number of samples. Specifically, the reported number is 3 * iter_sampling in some brief experiments.

jgabry · 2024-12-13T15:53:12Z

Thanks for the extra details. This is indeed quite strange.

It looks like the number 150 is not related to the size of the vector but the number of samples. Specifically, the reported number is 3 * iter_sampling in some brief experiments.

The error seems to be happening when processing the data frame of sampler diagnostics, which your docker image helped me figure out. The data frame has iter_sampling rows and 3 columns (treedepth__, divergent__, energy__). If you set diagnostics = NULL when calling the sample method there's no error, but then if you call fit$sampler_diagnostics() after sampling you'll still get the error. But that's as far as I've gotten so far. What I really don't understand is why this would only be reproducible on Docker (as far as we know). Or have you been able to reproduce it outside of Docker?

tillahoffmann · 2024-12-13T16:54:11Z

It's weird with the Docker image. Maybe it's related to being on Linux rather than macOS? R being differently on Linux and macOS would be surprising, but I'm no R expert.

jgabry · 2024-12-13T17:08:30Z

This keeps getting stranger. So the function in the posterior package that you found (https://github.com/stan-dev/posterior/blob/79d4521b943e44f4ac31636c4488d9e2cfeac3ec/R/as_draws_array.R#L227-L239) is indeed where the error is happening. On my computer the input to that function is a list of matrices (one per chain). In the Docker image the input to that function is a list of data frames (one per chain, if chains > 1) and a single data frame if chains = 1.

jgabry · 2024-12-13T18:23:27Z

@tillahoffmann @paul-buerkner I think the problem is this commit to posterior: stan-dev/posterior@79d4521. I finally noticed that it was using the very latest development version from posterior. When I first tested that I couldn't reproduce the error deterministically because I hadn't refreshed my R session when installing different versions of posterior (I thought I had but I hadn't). When I did that I could reproduce the error.

jgabry · 2024-12-13T18:30:42Z

Ok @tillahoffmann check out stan-dev/posterior#386. Can you replicate that?

This should also mean that if you use the posterior package that’s on CRAN, not the latest GitHub version, the error should go away (I hope).

tillahoffmann · 2024-12-13T21:22:38Z

Thank you for the thorough investigation. Yes, I can replicate the behavior from stan-dev/posterior#386.

jgabry · 2024-12-13T21:34:00Z

Ok great, thanks for checking and for reporting this.

jgabry · 2024-12-17T16:39:57Z

We reverted the problematic commit in posterior, so I'm going to close this

jgabry · 2024-12-17T16:40:23Z

Thanks again for reporting this and creating the reproducible example in Docker, that was really helpful.

tillahoffmann · 2024-12-17T16:43:13Z

Great, thank you for digging into it and fixing it so quickly! I probably shouldn't be running single chains anyway. 😬

tillahoffmann added the bug Something isn't working label Dec 12, 2024

tillahoffmann added a commit to onnela-lab/gptools-reproduction-material that referenced this issue Dec 12, 2024

Use two chains as a work-around for stan-dev/cmdstanr#1050.

d91b3a8

jgabry mentioned this issue Dec 13, 2024

Most recent commit breaks CmdStanR models with 1 chain stan-dev/posterior#386

Closed

jgabry closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error regarding dimensions after sampling. #1050

Error regarding dimensions after sampling. #1050

tillahoffmann commented Dec 12, 2024

jgabry commented Dec 12, 2024 •

edited

Loading

tillahoffmann commented Dec 12, 2024

tillahoffmann commented Dec 12, 2024

jgabry commented Dec 12, 2024

jgabry commented Dec 12, 2024

tillahoffmann commented Dec 12, 2024 •

edited

Loading

jgabry commented Dec 13, 2024

tillahoffmann commented Dec 13, 2024 •

edited

Loading

jgabry commented Dec 13, 2024

jgabry commented Dec 13, 2024 •

edited

Loading

jgabry commented Dec 13, 2024 •

edited

Loading

tillahoffmann commented Dec 13, 2024

jgabry commented Dec 13, 2024

jgabry commented Dec 17, 2024

jgabry commented Dec 17, 2024

tillahoffmann commented Dec 17, 2024

Error regarding dimensions after sampling. #1050

Error regarding dimensions after sampling. #1050

Comments

tillahoffmann commented Dec 12, 2024

jgabry commented Dec 12, 2024 • edited Loading

tillahoffmann commented Dec 12, 2024

tillahoffmann commented Dec 12, 2024

jgabry commented Dec 12, 2024

jgabry commented Dec 12, 2024

tillahoffmann commented Dec 12, 2024 • edited Loading

jgabry commented Dec 13, 2024

tillahoffmann commented Dec 13, 2024 • edited Loading

jgabry commented Dec 13, 2024

jgabry commented Dec 13, 2024 • edited Loading

jgabry commented Dec 13, 2024 • edited Loading

tillahoffmann commented Dec 13, 2024

jgabry commented Dec 13, 2024

jgabry commented Dec 17, 2024

jgabry commented Dec 17, 2024

tillahoffmann commented Dec 17, 2024

jgabry commented Dec 12, 2024 •

edited

Loading

tillahoffmann commented Dec 12, 2024 •

edited

Loading

tillahoffmann commented Dec 13, 2024 •

edited

Loading

jgabry commented Dec 13, 2024 •

edited

Loading

jgabry commented Dec 13, 2024 •

edited

Loading