Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue 267 - ECDF plots #273

Closed

Conversation

TeemuSailynoja
Copy link
Collaborator

The first implementation of the ecdf plots mentioned in #267 . Currently, plots the mentioned plots for:

  • precomputed PIT values (one sample and multiple sample comparison)
  • Empirical PIT values when y and yrep are provided. Currently assumes yto consist of multiple draws from a univariate distribution -> PIT for multivariate y to be implemented later.
  • Fractional ranks for multiple sample comparison when only yrep is provided. Here each chain is ranked in relation to the total sample obtained by merging all of the chains together.

I decided to add the functions into ppc-intervals.R as the share most similarities with the functions in that file.

#' @return Either throws an error or returns a numeric matrix.
#' @noRd
validate_pit <- function(pit) {
stopifnot(is.matrix(pit), is.numeric(pit))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pit needs to be a matrix? The ecdf plot function would be useful also for vectors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be extended and tested for vectors.

@avehtari
Copy link
Contributor

First

empirical_pit <- function(y, x) {
     N = length(x)
     (colSums(outer(x, y, "<=")) + 1) / (N + 1)
 }
y=rnorm(N);x=rnorm(N);q=empirical_pit(y, x);

then this works

> ppc_ecdf_intervals(pit=matrix(q,nrow=1),gamma=gammax)

but this doesn't

> ppc_ecdf_intervals_difference(pit=matrix(q,nrow=1),gamma=gammax)
 Error: Aesthetics must be either length 1 or the same as the data (1001): y, colour and group
Run `rlang::last_error()` to see where the error occurred. 

Comment on lines +416 to +418
u_scale <- function(x) {
array(rank(x) / length(x), dim = dim(x), dimnames = dimnames(x))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is defined differently from the posterior package?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not seen the counterpart in the posterior package. There the fractional ranks seem to be normalized to { 1 / (2S), ..., 1 - 1 / (2S) }, where S is the number of values in the combined sample. Is the idea there to move the fractional ranks to the middle of the respective "bins" instead of having them at the "edges" as is implemented above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use the same function or use different name for the function if the functionaility is different.
The posterior::u_scale sets the values so that transformation to normal would produce values that are close to normal ordered statistics (while 0 and 1 would transform to -Inf and Inf, and the rule now used is the best according to Blom)

@avehtari
Copy link
Contributor

If I provide gamma different from what the code computes, I get two set of bands
image

@TeemuSailynoja
Copy link
Collaborator Author

First

empirical_pit <- function(y, x) {
     N = length(x)
     (colSums(outer(x, y, "<=")) + 1) / (N + 1)
 }
y=rnorm(N);x=rnorm(N);q=empirical_pit(y, x);

then this works

> ppc_ecdf_intervals(pit=matrix(q,nrow=1),gamma=gammax)

but this doesn't

> ppc_ecdf_intervals_difference(pit=matrix(q,nrow=1),gamma=gammax)
 Error: Aesthetics must be either length 1 or the same as the data (1001): y, colour and group
Run `rlang::last_error()` to see where the error occurred. 

The aesthetics of ppc_ecdf_intervals_difference were misaligned due to a misplaced parenthesis. This is now fixed. Thank you for pointing this out.

@TeemuSailynoja
Copy link
Collaborator Author

If I provide gamma different from what the code computes, I get two set of bands image

I chose both ppc_ecdf_intervals and ppc_ecdf_intervals_difference to draw by default confidence bands for both 50% and 90%, as these are the default values for the other ppc_intervals functions.
If this is confusing and only one set of confidence intervals would be preferrable, I can disable the other pair by default.

@avehtari
Copy link
Contributor

I chose both ppc_ecdf_intervals and ppc_ecdf_intervals_difference to draw by default confidence bands for both 50% and 90%, as these are the default values for the other ppc_intervals functions.
If this is confusing and only one set of confidence intervals would be preferrable, I can disable the other pair by default.

It is confusing that setting gamma, changes only one of the bands and I get lines that are often overlapping, and it's confusing that the color and thickness of the line are the same

@TeemuSailynoja
Copy link
Collaborator Author

Reworked in #282.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants