-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtypes of HypothesisTest can have optional fields :tail and :alpha #100
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks useful. However, the new feature should be tested. Also, the pvalue
function should only be modified for tests which support tail
, else it's misleading. Finally, the docs must be updated too.
src/HypothesisTests.jl
Outdated
@@ -68,35 +68,54 @@ function check_alpha(alpha::Float64) | |||
end | |||
end | |||
|
|||
# Utility to get an optional field (e.g., :tail and :alpha) | |||
getfield(value::HypothesisTest, name::Symbol, default::Any) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not use the name getfield
, which is associated with the a.b
syntax and is currently not overloadable. Also, this approach isn't really Julian. Better define fallbacks for get_tail
and get_alpha
which return the default (or maybe better, print a warning), and define specific methods for the distributions which support it (and which are very few).
Also, remove the "get_" prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the "get_" prefix, I simply called the functions tail
and alpha
. The tail is :undefined
, which is the value returned by default_tail
if that method is not overloaded. For all tests that always use some default tail, the tail
method returns that tail, just like default_tail
does. For t-tests, which can be configured to use another tail, that other tail is returned (or the default, if no other tail is specified).
I feel that the method default_tail
is superseded by this implementation of the tail
method.
default_tail
can still be used (and behaves just like tail
) but it emits a deprecation warning.
src/HypothesisTests.jl
Outdated
|
||
alpha = get_alpha(test) | ||
tail = get_tail(test) | ||
conf_string = string((1 - alpha) * 100) * "%" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to do string((1 - alpha) * 100)
and to insert %
directly in the final string to avoid one allocation.
src/HypothesisTests.jl
Outdated
|
||
# population details | ||
has_ci = applicable(StatsBase.confint, test) | ||
label_len = has_ci ? length(conf_string) + 21 : 22 # 21 is length of ' confidence interval:', 22 that of 'parameter of interest:' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break line at 92 chars, e.g. by moving the comment to a separate line.
But this actually sounds kind of overkill to me: it would be simpler to align all lines (of all blocks) to the longest possible one all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a function prettify_detail
inside Base.show
now that aligns each line consisting of label and value to the longest possible length.
src/HypothesisTests.jl
Outdated
get_tail(test::HypothesisTest) = getfield(test, :tail, default_tail(test)) | ||
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05) | ||
|
||
# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap line at 92 chars (check other places too).
src/HypothesisTests.jl
Outdated
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05) | ||
|
||
# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s)) | ||
with_trailing_whitespace(s::String, len::Int) = s * join(repeat([" "], outer=max(len - length(s), 0)), "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just " "^max(len - length(s), 0))
.
src/augmented_dickey_fuller.jl
Outdated
@@ -215,4 +216,4 @@ function show_params(io::IO, x::ADFTest, ident) | |||
println(io, ident, "Critical values at 1%, 5%, and 10%: ", x.cv') | |||
end | |||
|
|||
pvalue(x::ADFTest) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=:left) | |||
pvalue(x::ADFTest; tail=default_tail(x)) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you don't use tail=:left
here, since you already know the result of default_tail
for that test. Same below. Also, 92 char limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this change by removing the named tail parameter for all but the t-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather keep it, but call tail
rather than default_tail
.
src/circular.jl
Outdated
function pvalue(x::RayleighTest) | ||
function pvalue(x::RayleighTest; tail=default_tail(x)) | ||
if tail != default_tail(x) # warn that tail is not used here | ||
warn("tail=$tail has no effect on the computation of the p value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better throw an error if the argument doesn't work.
src/t.jl
Outdated
check_alpha(alpha) | ||
|
||
if tail == :left | ||
(-Inf, StatsBase.confint(x, alpha*2)[2]) | ||
(-Inf, StatsBase.confint(x, alpha*2, tail=:both)[2]) # tail=:both required as recursive anchor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"recursive anchor"? I don't understand what happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quantiles are only computed if tail==:both
. The recursion would be infinite if the argument tail=:both
would not be provided because tail
is set to the tail of x
now, which may be one-sided.
The term "recursive anchor" is chosen because tail==:both
will not allow infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the term is not obvious, at least not to me. Just say something about blocking infinite recursion.
…ult instead of calling default_tail
The default_tail function was renamed to tail and overloaded for the different defaults or field access (in the case of t-tests).
All tests pass but the deprecation warnings are too annoying to actually run the tests every time.
Thank you very much, @nalimilan, for your detailed review! I implemented all changes you requested and even learned about some other nice features of Julia during the process. Docs and Unit tests are updated. See my comments for more information and please let me know, if there is anything else to change. |
Please note that the Travis build failed for I hope this is no problem for this pull request. The REQUIRE file states that only julia 0.5 is meant to be supported. |
.travis.yml
Outdated
@@ -5,7 +5,7 @@ os: | |||
- osx | |||
julia: | |||
- 0.5 | |||
- nightly | |||
# - nightly # The REQUIRE file states that only julia 0.5 is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave this as it was, it's not an issue if the CI fails for this version (we're not robots).
doc/api/confint.rst
Outdated
|
||
Compute a confidence interval C with coverage 1-``alpha``. | ||
Compute a confidence interval C with coverage 1-``alpha`` | ||
(``alpha=0.05`` is the default for all tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.05 is the default for all tests unless overridden when creating the object.
doc/parametric/test_t.rst
Outdated
@@ -54,5 +66,8 @@ T-test | |||
.. math:: | |||
\nu_{\chi'} \approx \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2} | |||
{\sum_{i=1}^n \frac{(k_i s_i^2)^2}{\nu_i}} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces here and elsewhere.
src/HypothesisTests.jl
Outdated
|
||
# utilities for pretty-printing | ||
conf_string = string(floor((1 - alpha(test)) * 100, 6)) # limit to 6 decimals in % | ||
prettify_detail(label::String, value::Any, len::Int) = # len is max length of label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "format" rather than "prettify".
src/HypothesisTests.jl
Outdated
if has_ci | ||
println(io, " 95% confidence interval: $(StatsBase.confint(test))") | ||
println(io, prettify_detail(conf_string*"% confidence interval:", StatsBase.confint(test), 32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for StatsBase.
, right?
doc/api/confint.rst
Outdated
|
||
If ``tail`` is ``:both`` (default), then a two-sided confidence | ||
If ``tail`` is ``:both`` (default for most tests), then a two-sided confidence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"most" isn't precise enough: give a list of tests for which it is the case below. Same remark for pvalue
.
doc/parametric/test_t.rst
Outdated
|
||
Perform a one sample t-test of the null hypothesis that the data | ||
in vector ``v`` comes from a distribution with mean ``mu0`` against | ||
the alternative hypothesis that the distribution does not have mean | ||
``mu0``. | ||
|
||
``tail`` and ``alpha`` specify the defaults for the application of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for the application of" is an unusual vocabulary. Maybe "when calling"?
src/HypothesisTests.jl
Outdated
p = pvalue(test) | ||
outcome = if p > 0.05 "fail to reject" else "reject" end | ||
tail = default_tail(test) | ||
tail = HypothesisTests.tail(test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HypothesisTests.
isn't needed. Same below in comment.
src/HypothesisTests.jl
Outdated
tail = default_tail(test) | ||
tail = HypothesisTests.tail(test) | ||
p = pvalue(test) # obeys value of HypothesisTests.tail(test) if applicable | ||
outcome = if p > alpha(test) "fail to reject" else "reject" end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more standard to use p > alpha(test) ? ... : ...
. Below, since the conditions are relatively long, would be better to repeat tailvalue =
inside each branch (i.e. closer to what it was before).
|
||
default_tail(test::TTest) = :both | ||
pvalue(x::TTest; tail=x.tail) = pvalue(TDist(x.df), x.t; tail=tail) | ||
tail(x::TTest) = x.tail # defaults set by constructors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than defining this here, can you just use these definitions for all HypothesisTest
types, and add the needed fields to all tests? If tail
doesn't make sense for some types, you can then override the method (to a hardcoded value) instead of adding the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nalimilan,
Adding a tail
field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue
function has a tail
parameter, and those types for which pvalue
has a fixed tail.
I will add a tail
field to those types, for which the pvalue
function takes a tail parameter.
The alpha value is relevant to all tests. Therefore, I will add an alpha
field to all types. I will also add a simple function obtaining the test outcome:
reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha
I am looking forward to your comments!
pvalue
with fixed tail (no tail parameter; 9 types):
- OneSampleADTest
- KSampleADTest
- ADFTest
- BoxPierceTest
- LjungBoxTest
- BreuschGodfreyTest
- RayleighTest
- JarqueBeraTest
- KruskalWallisTest
pvalue
with tail parameter (15 types):
- BinomialTest
- SignTest
- FisherTLinearAssociation
- JammalamadakaCircularCorrelation
- FisherExactTest
- ExactKSTest
- ApproximateOneSampleKSTest
- ApproximateTwoSampleKSTest
- ExactMannWhitneyUTest
- ApproximateMannWhitneyUTest
- PowerDivergenceTest
- ExactSignedRankTest
- ApproximateSignedRankTest
- ZTest
- TTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.
I will add a tail field to those types, for which the pvalue function takes a tail parameter.
Yes, of course, no need to add the field to tests which don't need it, overloading the method will be enough for these.
The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types.
Right.
I will also add a simple function obtaining the test outcome:
reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha
I'd rather not add this function, or at least not in this PR, as that's a different issue which deserves some discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another branch, I already started to add the alpha
field to all types, and the tail
field those types for which the pvalue
method has a tail parameter. I also develop corresponding unit tests and documentation.
Apparently, these changes are quite extensive. We should discuss what the present PR should be going for.
Initially, this PR was meant to (only) provide the infrastructure for subtypes of HypothesisTest to have tail
and alpha
fields. Initially, I was not planning to actually add these fields to all types. Starting to do so confirmed my impression that this should be a different work package.
I see that all tests should have alpha
and tail
fields (if applicable), but I suggest to delay these changes to a follow-up PR. I'd like the present PR to focus on the infrastructure, as initially intended. Of course, I will be applying the changes required to make this PR a complete extension to the HypothesisTests package. With what I already did in adding the fields to all types, the next PR is already in preparation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add support for all tests at the same time. Else the package will be in an inconsistent state. What's the interest of merging the PR in a partial state?
…eld of every test.
|
||
# confidence interval by inversion | ||
function StatsBase.confint(x::TTest, alpha::Float64=0.05; tail=:both) | ||
function StatsBase.confint(x::TTest, alpha::Float64=x.alpha; tail::Symbol=x.tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use alpha(x)
and tail(x)
since that gives an identical signature for all tests.
|
||
Compute the p-value for a given significance test. | ||
|
||
If ``tail`` is ``:both`` (default), then the p-value for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to mention that the default tail depends on the specific test, and give a list of tests with their default.
p = pvalue(test) # obeys value of tail(test) if applicable | ||
outcome = p > alpha(test) ? "fail to reject" : "reject" | ||
if testtail == :both | ||
plabel ="two-sided p-value:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space.
end | ||
println(io, "Test summary:") | ||
println(io, format_detail("outcome with "*conf_string*"% confidence:", outcome*" h_0", 36)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces around *
for consistency and readability. Here you can actually use string interpolation instead.
if testtail == :both | ||
plabel ="two-sided p-value:" | ||
elseif testtail == :left || testtail == :right | ||
plabel = "one-sided p-value ($(string(testtail)) tail):" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string
isn't needed AFAIK.
This enables the specification of tail (left/right/both) and alpha value in the constructors of HypothesisTest subtypes:
Since the additional arguments of the constructors are named arguments, nobody using this package has to change his constructor calls. Defaults are used so that the package works just as before when the optional arguments are not provided.
Currently, the new optional functionality of this pull request is only implemented for the t-tests. Other subtypes of HypothesisTest can easily follow the adaptations of
src/t.jl
.I am looking forward to your comments!
Mirko
Note that this pull request was already present as pull request #99. Since there was a problem with the Travis build routine, I reopen that pull request here.