issue 267 - ECDF plots #273

TeemuSailynoja · 2021-07-15T09:53:24Z

The first implementation of the ecdf plots mentioned in #267 . Currently, plots the mentioned plots for:

precomputed PIT values (one sample and multiple sample comparison)
Empirical PIT values when y and yrep are provided. Currently assumes yto consist of multiple draws from a univariate distribution -> PIT for multivariate y to be implemented later.
Fractional ranks for multiple sample comparison when only yrep is provided. Here each chain is ranked in relation to the total sample obtained by merging all of the chains together.

I decided to add the functions into ppc-intervals.R as the share most similarities with the functions in that file.

…tation.

…ervals.

…as ribbon.

…splot into ecdf-plots-issue-267

…cdf-plots-issue-267

avehtari · 2021-10-18T13:54:57Z

R/helpers-ppc.R

+#' @return Either throws an error or returns a numeric matrix.
+#' @noRd
+validate_pit <- function(pit) {
+  stopifnot(is.matrix(pit), is.numeric(pit))


Why pit needs to be a matrix? The ecdf plot function would be useful also for vectors.

This should be extended and tested for vectors.

avehtari · 2021-10-18T14:01:43Z

First

empirical_pit <- function(y, x) {
     N = length(x)
     (colSums(outer(x, y, "<=")) + 1) / (N + 1)
 }
y=rnorm(N);x=rnorm(N);q=empirical_pit(y, x);

then this works

> ppc_ecdf_intervals(pit=matrix(q,nrow=1),gamma=gammax)

but this doesn't

> ppc_ecdf_intervals_difference(pit=matrix(q,nrow=1),gamma=gammax)
 Error: Aesthetics must be either length 1 or the same as the data (1001): y, colour and group
Run `rlang::last_error()` to see where the error occurred.

avehtari · 2021-10-18T14:06:14Z

R/helpers-ppc.R

+u_scale <- function(x) {
+  array(rank(x) / length(x), dim = dim(x), dimnames = dimnames(x))
+}


Why this is defined differently from the posterior package?

I had not seen the counterpart in the posterior package. There the fractional ranks seem to be normalized to { 1 / (2S), ..., 1 - 1 / (2S) }, where S is the number of values in the combined sample. Is the idea there to move the fractional ranks to the middle of the respective "bins" instead of having them at the "edges" as is implemented above?

It would be good to use the same function or use different name for the function if the functionaility is different.
The posterior::u_scale sets the values so that transformation to normal would produce values that are close to normal ordered statistics (while 0 and 1 would transform to -Inf and Inf, and the rule now used is the best according to Blom)

avehtari · 2021-10-18T14:28:58Z

If I provide gamma different from what the code computes, I get two set of bands

TeemuSailynoja · 2021-10-18T19:19:44Z

First

empirical_pit <- function(y, x) {
     N = length(x)
     (colSums(outer(x, y, "<=")) + 1) / (N + 1)
 }
y=rnorm(N);x=rnorm(N);q=empirical_pit(y, x);

then this works

> ppc_ecdf_intervals(pit=matrix(q,nrow=1),gamma=gammax)

but this doesn't

> ppc_ecdf_intervals_difference(pit=matrix(q,nrow=1),gamma=gammax)
 Error: Aesthetics must be either length 1 or the same as the data (1001): y, colour and group
Run `rlang::last_error()` to see where the error occurred.

The aesthetics of ppc_ecdf_intervals_difference were misaligned due to a misplaced parenthesis. This is now fixed. Thank you for pointing this out.

TeemuSailynoja · 2021-10-18T19:23:41Z

If I provide gamma different from what the code computes, I get two set of bands

I chose both ppc_ecdf_intervals and ppc_ecdf_intervals_difference to draw by default confidence bands for both 50% and 90%, as these are the default values for the other ppc_intervals functions.
If this is confusing and only one set of confidence intervals would be preferrable, I can disable the other pair by default.

avehtari · 2021-10-19T06:50:04Z

I chose both ppc_ecdf_intervals and ppc_ecdf_intervals_difference to draw by default confidence bands for both 50% and 90%, as these are the default values for the other ppc_intervals functions.
If this is confusing and only one set of confidence intervals would be preferrable, I can disable the other pair by default.

It is confusing that setting gamma, changes only one of the bands and I get lines that are often overlapping, and it's confusing that the color and thickness of the line are the same

TeemuSailynoja · 2022-01-21T14:18:39Z

Reworked in #282.

TeemuSailynoja added 30 commits June 10, 2021 18:05

helper functions and start for ecdf plot functions

12264f9

Added helper functions for ECDF confidence intervals.

a108e01

ecdf confidence intervals and sample comparison data preparation.

cfc5c8b

typo

0ec2bc0

updated NAMESPACE and Rd

b11e0f9

prelim bug fixes

60af0ef

prelim bug fixes

20daa12

prelim bug fixes

3c77d51

change of default functionality to merge yrep for empirical pit compu…

60f3f38

…tation.

typo

7e25468

typo

e545f40

typo

67dbd86

typo

911d73a

replace NA in rep_id.

9205ac4

only attempt to draw 'yrep' if 'y' not submitted.

943274d

Correctly compute N from the given values.

fa92c72

Fix the scaling of the confidence intervals.

ae1cf0b

Data missing frpm ggplot

f34c2d4

correcting ggplot usage.

70511d0

remove debug printting.

de47a21

more ggplot corrections

2c53f31

fix x axis replication with sample comparison.

08467bb

remove diagonal line from ecdf plot.

9d39db3

Change rep_id to factors for color scaling.

c40362b

color scaling fixes.

d4f7517

color by rep label

7e92da8

Added ppc_ecdf_intervals_difference.

c77881d

fixing NAMESPACE

e1cd094

Moved ppc_ecdf_intervals and ppc_ecdf_intervals_difference to PPC-int…

ed832d0

…ervals.

Changed ppc_ecdf_intervals(_difference) to show confidence intervals …

d910940

…as ribbon.

TeemuSailynoja added 19 commits June 30, 2021 16:45

added handling of missing gamma.

c70124d

ppc_ecdf_intervals_difference with fixed step ribbons.

bd8b90b

ppc_ecdf_intervals_difference more ribbon fixing.

259fbc3

Improve legends and colours of single ecdf plots.

5b71312

colour change to aesthetics

dbedd6f

fill conf intervals

94f2988

fill also for difference plot

e958d78

try joining color and fill in legend.

fefaecb

try joining color and fill in legend.

fa9741b

try joining color and fill in legend.

5fde0d7

typo in validate_pit

670817e

Merge branch 'ecdf-plots-issue-267' of github.com:TeemuSailynoja/baye…

e74dc83

…splot into ecdf-plots-issue-267

Documentation and plot labels.

1b0bdc8

legent labels fix

d2c7b67

Documentation and examples.

6222007

Merge branch 'master' of https://github.com/stan-dev/bayesplot into e…

fdb17ae

…cdf-plots-issue-267

cleaned ppc-distributions.R and fixed documentation from ppc-intervals.R

2aa0737

typo in ppc-intervals.R, files roxygenised.

10dd207

ranks in empirical_pit start from 1 instead of 0.

d0e3d04

avehtari reviewed Oct 18, 2021

View reviewed changes

Fixed misaligned aesthetics in ppc_ecdf_intervals_difference

f4f87d3

TeemuSailynoja closed this Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue 267 - ECDF plots #273

issue 267 - ECDF plots #273

TeemuSailynoja commented Jul 15, 2021

avehtari Oct 18, 2021

TeemuSailynoja Oct 18, 2021

avehtari commented Oct 18, 2021

avehtari Oct 18, 2021

TeemuSailynoja Oct 18, 2021

avehtari Oct 19, 2021

avehtari commented Oct 18, 2021

TeemuSailynoja commented Oct 18, 2021

TeemuSailynoja commented Oct 18, 2021

avehtari commented Oct 19, 2021

TeemuSailynoja commented Jan 21, 2022

issue 267 - ECDF plots #273

issue 267 - ECDF plots #273

Conversation

TeemuSailynoja commented Jul 15, 2021

avehtari Oct 18, 2021

Choose a reason for hiding this comment

TeemuSailynoja Oct 18, 2021

Choose a reason for hiding this comment

avehtari commented Oct 18, 2021

avehtari Oct 18, 2021

Choose a reason for hiding this comment

TeemuSailynoja Oct 18, 2021

Choose a reason for hiding this comment

avehtari Oct 19, 2021

Choose a reason for hiding this comment

avehtari commented Oct 18, 2021

TeemuSailynoja commented Oct 18, 2021

TeemuSailynoja commented Oct 18, 2021

avehtari commented Oct 19, 2021

TeemuSailynoja commented Jan 21, 2022