-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pp_check for prior predictive checks #320
Comments
I don't think you are making the predictions correctly because you are not multiplying the x1 and x2 values. Or, you might misunderstand the purpose of prior/posterior prediction which is to take you observed data and get the model's predictions for those observations using draws from the prior or posterior. (If the posterior produces predictions that look like real data, the model is at least doing something correctly.) Let's take a fitted model and make the density plot from scratch without bayesplot: # install.packages("tidybayes")
preds <- dat |>
tidybayes::add_predicted_draws(mod.0_ppc, ndraws = 100)
ggplot(preds) +
aes(x = .prediction) +
geom_density(aes(group = .draw)) These are prior predictions for the original observations. The original observations have x1 and x2 as Normal(100, 15). The priors for b1,b2 are Normal(0, .5), so it is reasonable to get these coefficients to shove the prior predicted densities way to -200 or 200. We can also compute the prior predictions by hand directly from the model with matrix multiplication. coef <- mod.0_ppc |>
as_draws_matrix() |>
_[1:100, c("b_Intercept", "b_x1", "b_x2", "sigma")]
m <- model.matrix( ~ x1 + x2, data = dat)
epreds <- coef[, 1:3] %*% t(m)
# this has one row per draw, one column per observation
# Add noise to epreds
preds <- epreds
for (draw_num in seq_len(nrow(coef))) {
preds[draw_num, ] <- epreds[draw_num, ] +
rnorm(ncol(epreds), 0, coef[draw_num, "sigma"])
}
plot(density(preds[1, ]))
Which is also outside of the prior distribution of the intercept. To get something closer your intercept, we need to zero out the x1 and x2 values in the data used for predictions. # install.packages("tidybayes")
preds_0 <- dat |>
mutate(x1 = 0, x2 = 0) |>
tidybayes::add_predicted_draws(mod.0_ppc, ndraws = 100)
ggplot(preds_0) +
aes(x = .prediction) +
geom_density(aes(group = .draw)) This gets us a little closer, but these are 100 predicted values that occur at the prior intercept that are then jittered based on the draw of the sigma coefficient. Let's instead draw one value per sample at x1=x2=0 and not apply any noise. That is, we use the epred. We will also crank up the number of posterior samples at play: epreds_0 <- dat |>
mutate(x1 = 0, x2 = 0) |>
head(1) |>
tidybayes::add_epred_draws(mod.0_ppc, ndraws = 1000)
ggplot(epreds_0) +
aes(x = .epred) +
geom_density() +
stat_function(
fun = dnorm,
args = list(mean = 3.5, sd = 2.5),
geom = "line", color = "red"
) So we recovered the prior distribution of the intercept by
|
Hello TJ, many thanks for your reply. Apparently, pp_check i does not what I though it does. I have a different understanding, what is meant with "prior predictive check". To me it is a check to see how the joint priors behave and would predict a distribution of my data (or the mean, median, regression coefficients, whatever) without seeing the data itself. Therefore, I do not understand why the prior distributions are multiplied with the data. For me your code
gives me random draws from the priors of each parameter. I would use these draws to combine them with the likelihood function and the specified model to produce a sample in the size of my original data. I would consider these "prior data samples" as my prior predictions and compare them either to theoretical considerations (are the values in a plausible range) and/or to the data parameters itself. The latter approach may have the disadvantage that we will adjust our priors too much according to the data. In my example, I deliberately used "wrong" or not well suited priors (as I said for educational purposes), but it may be the case that your prior knowledge is very different from your data. At least this is how I understood McElreath (2022) about prior predictive checks and Gabry et al. (2019), only to use the prior distributions, without predictions using the data, Can you see my struggle/problem? |
I understand the confusion. With a posterior predictive check, we want to predictions that condition on the observed data but in a prior predictive check, we might not even have data in hand and it should support predictions in the absence of data.
But
Or once you have a matrix of predictions in hand, you can use bayesplot:
|
Hi TJ! Many thanks for your time and help, I think that solves my confusion! All the best! |
Hello there!
For educational purposes, I experimented a bit with different model formulations, with and without centering in brms. For that, I formulated priors and wanted to get prior draws with the
sample_priors = "only
argument and wanted to do graphical prior predictive checks withpp_check()
. Then I noticed an unexpected behavior in the graphical output, since the displayed drawn distributions did neither resemble my prior formulations, nor the summary statistics of the brms.fit object. I tried two different data sets and a simulated toy data set (see below, code for data and analyses and complete code in the .txt file), which all showed the same behavior. I am unsure, if I understood something completely wrong or if there is a problem with pp_check, when displaying the priors for this specific models. For comparison, I extracted and plotted the prior draws from the brms.fit model manually and these plots looked as expected.The brms package makes a difference how to handle the intercept, if you have centered vs uncentered data. If you formulate the model
y ~ 0 + Intercept + x1 + x2
, brms allows for non centered data and you can directly specify a prior for the intercept. Otherwise, if you formulatey ~ 1 + x1 + x2
, brms assumes centering and will adapt the Intercept prior to the mean values of the according data.Therefore, my first model recognized the raw metric for uncentered data:
And the appropriate (very vague) priors:
And the model:
The output summary looks correct and as expected in accordance with the specified priors:
But when I display the prior draws with pp_check, it looks totally all over the place:
The mean values for the distributions (intercepts) are placed all over the range of -200 to 200, which should not be with this priors.
To better understand what was going on, I randomly drew prior values and plotted the density plot by myself, to see what brms produced:
To my surprise, the manually produced prior distributions look as expected and in accordance with the specified priors and the summary for the brms.fit object:
I tried this with 3 datasets, which all looked similar. To countercheck, I also applied the formula for centered variables, which should lead to distorted prior predictions, since brms internally assumes centering and will pick according priors for the Intercept.
Now the summary looks bad, as expected:
and also the manually extracted prior draws (so bad is 'good' since we expected this weird result):
But now, the pp_check density plots behave quite normal, although the distributions are wide, but at least centered around the intercept, not as above:
So, please could someone explain this to me, why do the brms.fit results and the manually extracted plots behave like expected and the plots by pp_check are quite weird? Did I understand something completely wrong, how pp_check is intended to work? To me it seems as if internally there is an option to plot "centered" or "un-centered" and that this is switched in the wrong direction.
I appreciate any help from you!
Many thanks,
Rainer
test_pp_check.txt
Edit: The first density plot from pp_check shows a pattern that distributions with smaller sigma are clustered around the Intercept. This happened only with the toy data set, where the predictors were normally distributed. With real data, where the predictors did not follow a specific distribution, the sampled distributions were all over the place. In my view, this also indicates that the data structure is taken into account, as with the centered brms analysis.
The text was updated successfully, but these errors were encountered: