-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multigroup permutations #198
base: master
Are you sure you want to change the base?
Conversation
So this seems problematic, the P-value is heavily affected by the order of the groups in the dataset. Here I run the test on a dataset that has 3 groups (means are -1, 0, or 1), but I vary the order of the groups. The resulting P-value is very different depending on the order. julia> using Combinatorics
julia> data = [rand(Normal(μ, 1), 20) for μ in (-1, 0, 1)];
julia> for x in permutations(data)
P = pvalue(ApproximatePermutationTest(x, mean, 10^5))
@show P
end
P = 1.0e-5
P = 0.0
P = 0.76717
P = 0.76947
P = 0.0
P = 0.0 I suspect that in My question is, how do we make the test independent of the order of the groups? Right now, it treats the data similarly to a Perhaps this isn't a problem. In reality, if the 3 groups came about by a ordinal predictor AND the predictors were aligned to the means: 1 => -1, 2 => 0, and 3 => 1, then we should order the groups according to the predictors (1,2, and 3). Which of course does result, as expected, in a very low P-value: julia> P = pvalue(ApproximatePermutationTest(data, mean, 10^5))
0.0 After some more reading, maybe in this specific implementation of a multi-group permutation test it is totally fine to require the predictor to be:
|
I think the basic issue here is that you're using Maybe a (more radical) revised design would treat ApproximatePermutationTest(x, y, f, n) would be ApproximatePermutationTest([x, y], xy -> mapreduce(f, -, xy), n) Edit: the important thing here is that the reducer is part of the definition of the null hypothesis you're specifying, so people should have to specify it manually when there are more than two groups and the default H0 of "value is the same for all groups" doesn't map easily onto subtraction. |
Sure! That's easy to amend. But to be clear, other than that, there's nothing statistically wrong with this setup? I'd like to include an example for testing the difference in say, means, or variability, between multiple groups. Where the supplied reducer will for instance be |
OK, done, minus the tests... Let me know if this looks right. |
Tests added and passed, but I didn't include tests for more than 2 groups because I'm not certain what standard tests I could compare to (other than self generated data, which seems like cheating). |
Do not merge yet. This is for discussion and collaborations. But could be cool!