Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtypes of HypothesisTest can have optional fields :tail and :alpha #100

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

mirkobunse
Copy link

This enables the specification of tail (left/right/both) and alpha value in the constructors of HypothesisTest subtypes:

using HypothesisTests
EqualVarianceTTest(x, y, alpha=0.0025, tail=:right)

Since the additional arguments of the constructors are named arguments, nobody using this package has to change his constructor calls. Defaults are used so that the package works just as before when the optional arguments are not provided.

Currently, the new optional functionality of this pull request is only implemented for the t-tests. Other subtypes of HypothesisTest can easily follow the adaptations of src/t.jl.

I am looking forward to your comments!
Mirko


Note that this pull request was already present as pull request #99. Since there was a problem with the Travis build routine, I reopen that pull request here.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks useful. However, the new feature should be tested. Also, the pvalue function should only be modified for tests which support tail, else it's misleading. Finally, the docs must be updated too.

@@ -68,35 +68,54 @@ function check_alpha(alpha::Float64)
end
end

# Utility to get an optional field (e.g., :tail and :alpha)
getfield(value::HypothesisTest, name::Symbol, default::Any) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use the name getfield, which is associated with the a.b syntax and is currently not overloadable. Also, this approach isn't really Julian. Better define fallbacks for get_tail and get_alpha which return the default (or maybe better, print a warning), and define specific methods for the distributions which support it (and which are very few).

Also, remove the "get_" prefix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the "get_" prefix, I simply called the functions tail and alpha. The tail is :undefined, which is the value returned by default_tail if that method is not overloaded. For all tests that always use some default tail, the tail method returns that tail, just like default_tail does. For t-tests, which can be configured to use another tail, that other tail is returned (or the default, if no other tail is specified).

I feel that the method default_tail is superseded by this implementation of the tail method.
default_tail can still be used (and behaves just like tail) but it emits a deprecation warning.


alpha = get_alpha(test)
tail = get_tail(test)
conf_string = string((1 - alpha) * 100) * "%"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to do string((1 - alpha) * 100) and to insert % directly in the final string to avoid one allocation.


# population details
has_ci = applicable(StatsBase.confint, test)
label_len = has_ci ? length(conf_string) + 21 : 22 # 21 is length of ' confidence interval:', 22 that of 'parameter of interest:'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break line at 92 chars, e.g. by moving the comment to a separate line.

But this actually sounds kind of overkill to me: it would be simpler to align all lines (of all blocks) to the longest possible one all the time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a function prettify_detail inside Base.show now that aligns each line consisting of label and value to the longest possible length.

get_tail(test::HypothesisTest) = getfield(test, :tail, default_tail(test))
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)

# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap line at 92 chars (check other places too).

get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)

# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
with_trailing_whitespace(s::String, len::Int) = s * join(repeat([" "], outer=max(len - length(s), 0)), "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just " "^max(len - length(s), 0)).

@@ -215,4 +216,4 @@ function show_params(io::IO, x::ADFTest, ident)
println(io, ident, "Critical values at 1%, 5%, and 10%: ", x.cv')
end

pvalue(x::ADFTest) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=:left)
pvalue(x::ADFTest; tail=default_tail(x)) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=tail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you don't use tail=:left here, since you already know the result of default_tail for that test. Same below. Also, 92 char limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change by removing the named tail parameter for all but the t-tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather keep it, but call tail rather than default_tail.

src/circular.jl Outdated
function pvalue(x::RayleighTest)
function pvalue(x::RayleighTest; tail=default_tail(x))
if tail != default_tail(x) # warn that tail is not used here
warn("tail=$tail has no effect on the computation of the p value")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better throw an error if the argument doesn't work.

src/t.jl Outdated
check_alpha(alpha)

if tail == :left
(-Inf, StatsBase.confint(x, alpha*2)[2])
(-Inf, StatsBase.confint(x, alpha*2, tail=:both)[2]) # tail=:both required as recursive anchor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recursive anchor"? I don't understand what happens here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantiles are only computed if tail==:both. The recursion would be infinite if the argument tail=:both would not be provided because tail is set to the tail of x now, which may be one-sided.

The term "recursive anchor" is chosen because tail==:both will not allow infinite recursion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that the term is not obvious, at least not to me. Just say something about blocking infinite recursion.

@mirkobunse
Copy link
Author

mirkobunse commented Jun 17, 2017

Thank you very much, @nalimilan, for your detailed review! I implemented all changes you requested and even learned about some other nice features of Julia during the process. Docs and Unit tests are updated.

See my comments for more information and please let me know, if there is anything else to change.

@mirkobunse
Copy link
Author

Please note that the Travis build failed for julia:nightly because the package Rmath (and thus also StatsFuns and Distributions, on which this package depends) could not be precompiled.

I hope this is no problem for this pull request. The REQUIRE file states that only julia 0.5 is meant to be supported.

.travis.yml Outdated
@@ -5,7 +5,7 @@ os:
- osx
julia:
- 0.5
- nightly
# - nightly # The REQUIRE file states that only julia 0.5 is supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this as it was, it's not an issue if the CI fails for this version (we're not robots).


Compute a confidence interval C with coverage 1-``alpha``.
Compute a confidence interval C with coverage 1-``alpha``
(``alpha=0.05`` is the default for all tests).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.05 is the default for all tests unless overridden when creating the object.

@@ -54,5 +66,8 @@ T-test
.. math::
\nu_{\chi'} \approx \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2}
{\sum_{i=1}^n \frac{(k_i s_i^2)^2}{\nu_i}}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing spaces here and elsewhere.


# utilities for pretty-printing
conf_string = string(floor((1 - alpha(test)) * 100, 6)) # limit to 6 decimals in %
prettify_detail(label::String, value::Any, len::Int) = # len is max length of label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "format" rather than "prettify".

if has_ci
println(io, " 95% confidence interval: $(StatsBase.confint(test))")
println(io, prettify_detail(conf_string*"% confidence interval:", StatsBase.confint(test), 32))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for StatsBase., right?


If ``tail`` is ``:both`` (default), then a two-sided confidence
If ``tail`` is ``:both`` (default for most tests), then a two-sided confidence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"most" isn't precise enough: give a list of tests for which it is the case below. Same remark for pvalue.


Perform a one sample t-test of the null hypothesis that the data
in vector ``v`` comes from a distribution with mean ``mu0`` against
the alternative hypothesis that the distribution does not have mean
``mu0``.

``tail`` and ``alpha`` specify the defaults for the application of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for the application of" is an unusual vocabulary. Maybe "when calling"?

p = pvalue(test)
outcome = if p > 0.05 "fail to reject" else "reject" end
tail = default_tail(test)
tail = HypothesisTests.tail(test)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HypothesisTests. isn't needed. Same below in comment.

tail = default_tail(test)
tail = HypothesisTests.tail(test)
p = pvalue(test) # obeys value of HypothesisTests.tail(test) if applicable
outcome = if p > alpha(test) "fail to reject" else "reject" end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more standard to use p > alpha(test) ? ... : .... Below, since the conditions are relatively long, would be better to repeat tailvalue = inside each branch (i.e. closer to what it was before).


default_tail(test::TTest) = :both
pvalue(x::TTest; tail=x.tail) = pvalue(TDist(x.df), x.t; tail=tail)
tail(x::TTest) = x.tail # defaults set by constructors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than defining this here, can you just use these definitions for all HypothesisTest types, and add the needed fields to all tests? If tail doesn't make sense for some types, you can then override the method (to a hardcoded value) instead of adding the field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nalimilan,

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types. I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I am looking forward to your comments!


pvalue with fixed tail (no tail parameter; 9 types):

  • OneSampleADTest
  • KSampleADTest
  • ADFTest
  • BoxPierceTest
  • LjungBoxTest
  • BreuschGodfreyTest
  • RayleighTest
  • JarqueBeraTest
  • KruskalWallisTest

pvalue with tail parameter (15 types):

  • BinomialTest
  • SignTest
  • FisherTLinearAssociation
  • JammalamadakaCircularCorrelation
  • FisherExactTest
  • ExactKSTest
  • ApproximateOneSampleKSTest
  • ApproximateTwoSampleKSTest
  • ExactMannWhitneyUTest
  • ApproximateMannWhitneyUTest
  • PowerDivergenceTest
  • ExactSignedRankTest
  • ApproximateSignedRankTest
  • ZTest
  • TTest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

Yes, of course, no need to add the field to tests which don't need it, overloading the method will be enough for these.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types.

Right.

I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I'd rather not add this function, or at least not in this PR, as that's a different issue which deserves some discussion.

Copy link
Author

@mirkobunse mirkobunse Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another branch, I already started to add the alpha field to all types, and the tail field those types for which the pvalue method has a tail parameter. I also develop corresponding unit tests and documentation.

Apparently, these changes are quite extensive. We should discuss what the present PR should be going for.

Initially, this PR was meant to (only) provide the infrastructure for subtypes of HypothesisTest to have tail and alpha fields. Initially, I was not planning to actually add these fields to all types. Starting to do so confirmed my impression that this should be a different work package.

I see that all tests should have alpha and tail fields (if applicable), but I suggest to delay these changes to a follow-up PR. I'd like the present PR to focus on the infrastructure, as initially intended. Of course, I will be applying the changes required to make this PR a complete extension to the HypothesisTests package. With what I already did in adding the fields to all types, the next PR is already in preparation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add support for all tests at the same time. Else the package will be in an inconsistent state. What's the interest of merging the PR in a partial state?


# confidence interval by inversion
function StatsBase.confint(x::TTest, alpha::Float64=0.05; tail=:both)
function StatsBase.confint(x::TTest, alpha::Float64=x.alpha; tail::Symbol=x.tail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use alpha(x) and tail(x) since that gives an identical signature for all tests.


Compute the p-value for a given significance test.

If ``tail`` is ``:both`` (default), then the p-value for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention that the default tail depends on the specific test, and give a list of tests with their default.

p = pvalue(test) # obeys value of tail(test) if applicable
outcome = p > alpha(test) ? "fail to reject" : "reject"
if testtail == :both
plabel ="two-sided p-value:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space.

end
println(io, "Test summary:")
println(io, format_detail("outcome with "*conf_string*"% confidence:", outcome*" h_0", 36))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces around * for consistency and readability. Here you can actually use string interpolation instead.

if testtail == :both
plabel ="two-sided p-value:"
elseif testtail == :left || testtail == :right
plabel = "one-sided p-value ($(string(testtail)) tail):"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string isn't needed AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants