Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Simulation tests for coverage + p-values #73

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johnmyleswhite
Copy link
Member

This is a work-in-progress project to add compute-intensive simulation testing to HypothesisTests.jl to assess the coverage probabilities of nominal 95% CI's and the uniform distribution of p-values under the null.

So far, the results are as follows:

  • Currently passing tests
    • Kolmogorov-Smirnov tests
    • Kruskal-Wallis tests
    • Mann-Whitney tests
    • t-tests
    • z-tests (although increasing the number of simulations reveals expected small failures)
  • Currently failing tests
    • Anderson-Darling tests (known to have problems)
    • Binomial tests (known to be too conservative)
  • Still untested
    • Circular tests
    • Fisher exact tests
    • Power-divergence tests
    • Wilcoxon tests

Thoughts on the design of the simulations and action items based on the results would be very appreciated.

@johnmyleswhite
Copy link
Member Author

One question that arises is what properties we really want to guarantee for our CI's. In the literature, there's a divide between conservative procedures (for which the empirical coverage probability is higher than the nominal coverage), anti-conservative procedures (which are essentially just broken as they don't provide at least the expected coverage) and calibrated procedures, which produce the expected coverage. We could check that CI's are not anti-conservative by checking that empirical coverage is always above nominal coverage (up to noise). That would get the Bernoulli stuff passing, although I think we'll still produce non-uniform p-values and I think we won't fix the Anderson-Darling test, which is known to not work well in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant