Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECDF difference plot with confidence band #452

Open
sethaxen opened this issue Jun 24, 2021 · 1 comment
Open

ECDF difference plot with confidence band #452

sethaxen opened this issue Jun 24, 2021 · 1 comment

Comments

@sethaxen
Copy link
Member

Sometimes one has an ECDF or random sample and wants to compare it to the CDF of some known distribution. One could plot the ECDF of the sample and the CDF of the distribution and compare them, but they will always deviate, and it's difficult to tell if the difference is meaningful.

There are two ways to handle this: plotting the ECDF difference, and/or plotting a confidence band.

If x is exactly sampled from the distribution d, then the marginal (i.e. pointwise) distribution of the ECDF at each observed x value has a known form ecdf_marginal(d, N, x) = Binomial(N, cdf(d, x)) / N, where N is the size of the sample. One can use this to plot a confidence band that gives a sense for what deviation from d should be expected given the sample size N. An example:

using Distributions, StatsBase, StatsPlots, Random

rng = MersenneTwister(87)
plot()

N = 100
L = 100
α = 1 - 0.99
ecdf_marginal(d, N, x) = Binomial(N, cdf(d, x)) / N
dexp = Uniform(0, 1)

# plot uniform ecdf
euni = ecdf(rand(rng, dexp, N))
xuni = euni.sorted_values
plot!(xuni, euni(xuni) .- cdf.(dexp, xuni); color=:black, seriestype=:steppost, label="Uniform(0, 1)")

# plot Beta(1, 1.05) ecdf
ebeta = ecdf(rand(rng, Beta(1, 1.05), N))
xbeta = ebeta.sorted_values
plot!(xbeta, ebeta(xbeta) .- cdf.(dexp, xbeta); color=:red, seriestype=:steppost, label="Beta(1, 1.05)")
plot!(legend = :outertop)

# plot envelope
xall = unique!(sort!(vcat(xuni, xbeta)))
lb = quantile.(ecdf_marginal.(dexp, N, xall), α / 2)
ub = quantile.(ecdf_marginal.(dexp, N, xall), 1 - α / 2)
expected = cdf.(dexp, xall)
plot!(xall, lb .- expected; fillrange = ub .- expected, color=:lightgray, seriestype=:steppost, alpha=0.25, label="$(1-α)% confidence band")
plot!(xall, zero(expected); color=:gray, alpha=0.5, primary=false)

tmp

I propose that this be callable with an interface like

ecdfplot(xuni; refdist = Uniform(0, 1), difference=true, band_prob=0.95)
ecdfplot!(xbeta; refdist = Uniform(0, 1), difference=true)

It's worth noting that because the confidence interval is pointwise, the envelope should not be expected to contain the entire ECDF 95% of the time; it should in fact be less than that. It might be worth it to have an option to generate simultaneous confidence bands, which we could probably do by adapting the simulation-based method in (ref). These are more easily interpretable, as they should contain the entire ECDF curve 95% of the time.

@sethaxen sethaxen changed the title Add ecdf difference plot with confidence band ECDF difference plot with confidence band Jun 24, 2021
@sethaxen
Copy link
Member Author

Related issue for bayesplot: stan-dev/bayesplot#267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant