Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure rank overlay plot starts at 0 even if not all bins present #332

Merged
merged 2 commits into from
Dec 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions R/mcmc-traces.R
Original file line number Diff line number Diff line change
Expand Up @@ -304,17 +304,24 @@ mcmc_rank_overlay <- function(x,
mutate(cut = cut(.data$value_rank, n_bins)) %>%
group_by(.data$cut) %>%
mutate(bin_start = min(.data$value_rank)) %>%
ungroup() %>%
select(-c("cut"))
ungroup()

# Count how many values fall into each bin per chain & parameter
d_bin_counts <- data %>%
left_join(histobins, by = "value_rank") %>%
count(.data$parameter, .data$chain, .data$bin_start)

# Now ensure that all combinations of parameter, chain, and bin_start exist
# even if no counts are present (https://github.com/stan-dev/bayesplot/issues/331)
all_params_chains <- dplyr::distinct(data, .data$parameter, .data$chain)
all_bins <- dplyr::distinct(histobins, .data$bin_start, .data$cut)
combos <- dplyr::cross_join(all_params_chains, all_bins)
d_bin_counts <- full_join(combos, d_bin_counts, by = c("parameter", "chain", "bin_start")) %>%
mutate(n = dplyr::coalesce(n, 0L))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a more elegant solution, but this seems to get the job done

Copy link
Contributor

@sims1253 sims1253 Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is more elegant but an expand.grid with one left join and an if_else to fill the zeros should also work. Something like the following. Reads a little easier for me, though that might just be me. (not tested!)

all_combos <- tibble::as_tibble(expand.grid(
  parameter = unique(data$parameter),
  chain = unique(data$chain),
  bin_start = unique(histobins$bin_start),
  stringsAsFactors = FALSE
))

d_bin_counts <- all_combos %>%
  left_join(d_bin_counts, by = c("parameter", "chain", "bin_start")) %>%
  mutate(n = if_else(is.na(n), 0L, n))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Ensure all combinations exist and count values
d_bin_counts <- data %>%
  left_join(histobins, by = "value_rank") %>%
  count(.data$parameter, .data$chain, .data$bin_start) %>%
  right_join(
    dplyr::expand(.data$parameter, .data$chain, unique(histobins$bin_start)),
    by = c("parameter", "chain", "bin_start")
  ) %>%
  mutate(n = dplyr::coalesce(n, 0L))

Or courtesy of claude, telling me my solution was verbouse and not dplyr-ish enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to you and Claude, both!

Copy link
Member Author

@jgabry jgabry Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dplyr::expand

Except Claude didn't realize there is no dplyr::expand . There is tidyr::expand, but we don't already depend on tidyr so probably best not to add another dependency just for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, great help :D

# Duplicate the final bin, setting the left edge to the greatest x value, so
# that the entire x-axis is used,
right_edge <- max(data$value_rank)

d_bin_counts <- d_bin_counts %>%
dplyr::filter(.data$bin_start == max(.data$bin_start)) %>%
mutate(bin_start = right_edge) %>%
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions tests/testthat/data-for-mcmc-tests.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,11 @@ vdiff_dframe_chains_lp <- vdiff_dframe_chains_divergences
vdiff_dframe_chains_lp$Parameter <- NULL
vdiff_dframe_chains_lp$Value <- runif(2000, -100, -50)

vdiff_dframe_rank_overlay_bins_test <- posterior::as_draws_df(
list(
list(theta = -2 + 0.003 * 1:1000 + stats::arima.sim(list(ar = 0.7), n = 1000, sd = 0.5)),
list(theta = 1 + -0.01 * 1:1000 + stats::arima.sim(list(ar = 0.7), n = 1000, sd = 0.5))
)
)

set.seed(seed = NULL)
6 changes: 6 additions & 0 deletions tests/testthat/test-mcmc-traces.R
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,9 @@ test_that("mcmc_rank_overlay renders correctly", {
n_bins = 4
)

# https://github.com/stan-dev/bayesplot/issues/331
p_not_all_bins_exist <- mcmc_rank_overlay(vdiff_dframe_rank_overlay_bins_test)

vdiffr::expect_doppelganger("mcmc_rank_overlay (default)", p_base)
vdiffr::expect_doppelganger(
"mcmc_rank_overlay (reference line)",
Expand All @@ -164,6 +167,9 @@ test_that("mcmc_rank_overlay renders correctly", {
"mcmc_rank_overlay (wide bins)",
p_one_param_wide_bins
)

# https://github.com/stan-dev/bayesplot/issues/331
vdiffr::expect_doppelganger("mcmc_rank_overlay (not all bins)", p_not_all_bins_exist)
})

test_that("mcmc_rank_hist renders correctly", {
Expand Down
Loading