ECDF difference plot with confidence band #452

sethaxen · 2021-06-24T21:24:08Z

Sometimes one has an ECDF or random sample and wants to compare it to the CDF of some known distribution. One could plot the ECDF of the sample and the CDF of the distribution and compare them, but they will always deviate, and it's difficult to tell if the difference is meaningful.

There are two ways to handle this: plotting the ECDF difference, and/or plotting a confidence band.

If x is exactly sampled from the distribution d, then the marginal (i.e. pointwise) distribution of the ECDF at each observed x value has a known form ecdf_marginal(d, N, x) = Binomial(N, cdf(d, x)) / N, where N is the size of the sample. One can use this to plot a confidence band that gives a sense for what deviation from d should be expected given the sample size N. An example:

using Distributions, StatsBase, StatsPlots, Random

rng = MersenneTwister(87)
plot()

N = 100
L = 100
α = 1 - 0.99
ecdf_marginal(d, N, x) = Binomial(N, cdf(d, x)) / N
dexp = Uniform(0, 1)

# plot uniform ecdf
euni = ecdf(rand(rng, dexp, N))
xuni = euni.sorted_values
plot!(xuni, euni(xuni) .- cdf.(dexp, xuni); color=:black, seriestype=:steppost, label="Uniform(0, 1)")

# plot Beta(1, 1.05) ecdf
ebeta = ecdf(rand(rng, Beta(1, 1.05), N))
xbeta = ebeta.sorted_values
plot!(xbeta, ebeta(xbeta) .- cdf.(dexp, xbeta); color=:red, seriestype=:steppost, label="Beta(1, 1.05)")
plot!(legend = :outertop)

# plot envelope
xall = unique!(sort!(vcat(xuni, xbeta)))
lb = quantile.(ecdf_marginal.(dexp, N, xall), α / 2)
ub = quantile.(ecdf_marginal.(dexp, N, xall), 1 - α / 2)
expected = cdf.(dexp, xall)
plot!(xall, lb .- expected; fillrange = ub .- expected, color=:lightgray, seriestype=:steppost, alpha=0.25, label="$(1-α)% confidence band")
plot!(xall, zero(expected); color=:gray, alpha=0.5, primary=false)

I propose that this be callable with an interface like

ecdfplot(xuni; refdist = Uniform(0, 1), difference=true, band_prob=0.95)
ecdfplot!(xbeta; refdist = Uniform(0, 1), difference=true)

It's worth noting that because the confidence interval is pointwise, the envelope should not be expected to contain the entire ECDF 95% of the time; it should in fact be less than that. It might be worth it to have an option to generate simultaneous confidence bands, which we could probably do by adapting the simulation-based method in (ref). These are more easily interpretable, as they should contain the entire ECDF curve 95% of the time.

The text was updated successfully, but these errors were encountered:

sethaxen · 2021-06-28T09:35:40Z

Related issue for bayesplot: stan-dev/bayesplot#267

sethaxen changed the title ~~Add ecdf difference plot with confidence band~~ ECDF difference plot with confidence band Jun 24, 2021

sethaxen added the enhancement label Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECDF difference plot with confidence band #452

ECDF difference plot with confidence band #452

sethaxen commented Jun 24, 2021

sethaxen commented Jun 28, 2021

ECDF difference plot with confidence band #452

ECDF difference plot with confidence band #452

Comments

sethaxen commented Jun 24, 2021

sethaxen commented Jun 28, 2021