Skip to content

Subtypes of HypothesisTest can have optional fields :tail and :alpha #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
ef60cb7
Subtypes of HypothesisTest have an optional tail field
mirkobunse Jun 2, 2017
5f82c9c
Subtypes of HypothesisTest have an optional alpha field
mirkobunse Jun 2, 2017
14f15d1
Indent all labels of details in Base.show by the same length
mirkobunse Jun 2, 2017
c3363a1
get_tail and get_alpha reuse the same function getting a field value …
mirkobunse Jun 2, 2017
596444f
Fixed test run by providing optional tail parameter to all pvalue fun…
mirkobunse Jun 2, 2017
15282b5
Confidence interval in Base.show respects alpha parameter
mirkobunse Jun 6, 2017
8007b1e
Optional tail parameter of pvalue is initialized with hard-coded defa…
mirkobunse Jun 14, 2017
e63c378
Pretty-printing with fixed detail line length
mirkobunse Jun 14, 2017
c882e25
Fallback-function approach to tail and alpha.
mirkobunse Jun 14, 2017
c1ac852
Redirect deprecated default_tail to tail
mirkobunse Jun 16, 2017
36a9e76
Fixed Base.show for the case of tail being no keyword argument of the…
mirkobunse Jun 16, 2017
e6bc7a6
Added comments describing how to test deprecation forwarding.
mirkobunse Jun 16, 2017
bda80a7
Define IO Buffer once to ease temporary redirection to STDOUT
mirkobunse Jun 16, 2017
480e152
Added unit tests for tail parameter
mirkobunse Jun 16, 2017
96eb8cf
Updated doc of t-tests
mirkobunse Jun 16, 2017
7f66aa8
Merge remote-tracking branch 'origin/master' into tail_alpha
mirkobunse Jun 17, 2017
86626ec
Uncommented julia:nightly because not even specified by REQUIRE file
mirkobunse Jun 17, 2017
1938ffb
Removed named tail parameter from the remaining pvalue functions
mirkobunse Jun 18, 2017
5d38a50
Realised requested changes. Only missing change: tail and alpha as fi…
mirkobunse Jun 30, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 30 additions & 11 deletions src/HypothesisTests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -68,35 +68,54 @@ function check_alpha(alpha::Float64)
end
end

# Utility to get an optional field (e.g., :tail and :alpha)
getfield(value::HypothesisTest, name::Symbol, default::Any) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use the name getfield, which is associated with the a.b syntax and is currently not overloadable. Also, this approach isn't really Julian. Better define fallbacks for get_tail and get_alpha which return the default (or maybe better, print a warning), and define specific methods for the distributions which support it (and which are very few).

Also, remove the "get_" prefix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the "get_" prefix, I simply called the functions tail and alpha. The tail is :undefined, which is the value returned by default_tail if that method is not overloaded. For all tests that always use some default tail, the tail method returns that tail, just like default_tail does. For t-tests, which can be configured to use another tail, that other tail is returned (or the default, if no other tail is specified).

I feel that the method default_tail is superseded by this implementation of the tail method.
default_tail can still be used (and behaves just like tail) but it emits a deprecation warning.

if name in fieldnames(value)
Base.getfield(value, name)
else
default
end

get_tail(test::HypothesisTest) = getfield(test, :tail, default_tail(test))
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)

# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap line at 92 chars (check other places too).

with_trailing_whitespace(s::String, len::Int) = s * join(repeat([" "], outer=max(len - length(s), 0)), "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just " "^max(len - length(s), 0)).


# Pretty-print
function Base.show{T<:HypothesisTest}(io::IO, test::T)
println(io, testname(test))
println(io, repeat("-", length(testname(test))))

alpha = get_alpha(test)
tail = get_tail(test)
conf_string = string((1 - alpha) * 100) * "%"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to do string((1 - alpha) * 100) and to insert % directly in the final string to avoid one allocation.


# population details
has_ci = applicable(StatsBase.confint, test)
label_len = has_ci ? length(conf_string) + 21 : 22 # 21 is length of ' confidence interval:', 22 that of 'parameter of interest:'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break line at 92 chars, e.g. by moving the comment to a separate line.

But this actually sounds kind of overkill to me: it would be simpler to align all lines (of all blocks) to the longest possible one all the time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a function prettify_detail inside Base.show now that aligns each line consisting of label and value to the longest possible length.

(param_name, param_under_h0, param_estimate) = population_param_of_interest(test)
println(io, "Population details:")
println(io, " parameter of interest: $param_name")
println(io, " value under h_0: $param_under_h0")
println(io, " point estimate: $param_estimate")
println(io, " $(with_trailing_whitespace("parameter of interest:", label_len)) $param_name")
println(io, " $(with_trailing_whitespace("value under h_0", label_len)) $param_under_h0")
println(io, " $(with_trailing_whitespace("point estimate", label_len)) $param_estimate")
if has_ci
println(io, " 95% confidence interval: $(StatsBase.confint(test))")
println(io, " $conf_string confidence interval: $(StatsBase.confint(test, alpha, tail=tail))")
end
println(io)

# test summary
p = pvalue(test)
outcome = if p > 0.05 "fail to reject" else "reject" end
tail = default_tail(test)
p = pvalue(test, tail=tail)
outcome = if p > alpha "fail to reject" else "reject" end
println(io, "Test summary:")
println(io, " outcome with 95% confidence: $outcome h_0")
label_len = length(conf_string) + 25 # 25 is length of 'outcome with confidence:'
println(io, " outcome with $conf_string confidence: $outcome h_0")
if tail == :both
println(io, " two-sided p-value: $p")
println(io, " $(with_trailing_whitespace("two-sided p-value:", label_len)) $p")
elseif tail == :left || tail == :right
println(io, " one-sided p-value: $p")
println(io, " $(with_trailing_whitespace("one-sided p-value:", label_len)) $p")
else
println(io, " p-value: $p")
println(io, " $(with_trailing_whitespace("p-value:", label_len)) $p")
end
println(io)

Expand Down
3 changes: 2 additions & 1 deletion src/augmented_dickey_fuller.jl
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ end
testname(::ADFTest) = "Augmented Dickey-Fuller unit root test"
population_param_of_interest(x::ADFTest) =
("coefficient on lagged non-differenced variable", 0, x.coef)
default_tail(test::ADFTest) = :left

function show_params(io::IO, x::ADFTest, ident)
println(io, ident, "sample size in regression: ", x.n)
Expand All @@ -215,4 +216,4 @@ function show_params(io::IO, x::ADFTest, ident)
println(io, ident, "Critical values at 1%, 5%, and 10%: ", x.cv')
end

pvalue(x::ADFTest) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=:left)
pvalue(x::ADFTest; tail=default_tail(x)) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=tail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you don't use tail=:left here, since you already know the result of default_tail for that test. Same below. Also, 92 char limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change by removing the named tail parameter for all but the t-tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather keep it, but call tail rather than default_tail.

4 changes: 2 additions & 2 deletions src/box_test.jl
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ function show_params(io::IO, x::BoxPierceTest, ident)
println(io, ident, "Q statistic: ", x.Q)
end

pvalue(x::BoxPierceTest) = pvalue(Chisq(x.lag-x.dof), x.Q; tail=:right)
pvalue(x::BoxPierceTest; tail=default_tail(x)) = pvalue(Chisq(x.lag-x.dof), x.Q; tail=tail)

#Ljung-Box test

Expand Down Expand Up @@ -120,4 +120,4 @@ function show_params(io::IO, x::LjungBoxTest, ident)
println(io, ident, "Q statistic: ", x.Q)
end

pvalue(x::LjungBoxTest) = pvalue(Chisq(x.lag-x.dof), x.Q; tail=:right)
pvalue(x::LjungBoxTest; tail=default_tail(x)) = pvalue(Chisq(x.lag-x.dof), x.Q; tail=tail)
2 changes: 1 addition & 1 deletion src/breusch_godfrey.jl
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,4 @@ function show_params(io::IO, x::BreuschGodfreyTest, ident)
println(io, ident, "T*R^2 statistic: ", x.BG)
end

pvalue(x::BreuschGodfreyTest) = pvalue(Chisq(x.lag), x.BG; tail=:right)
pvalue(x::BreuschGodfreyTest; tail=default_tail(x)) = pvalue(Chisq(x.lag), x.BG; tail=tail)
5 changes: 4 additions & 1 deletion src/circular.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ function show_params(io::IO, x::RayleighTest, ident="")
println(io, ident, "test statistic: $(x.Rbar^2 * x.n)")
end

function pvalue(x::RayleighTest)
function pvalue(x::RayleighTest; tail=default_tail(x))
if tail != default_tail(x) # warn that tail is not used here
warn("tail=$tail has no effect on the computation of the p value")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better throw an error if the argument doesn't work.

end
Z = x.Rbar^2 * x.n
x.n > 1e6 ? exp(-Z) :
exp(-Z)*(1+(2*Z-Z^2)/(4*x.n)-(24*Z - 132*Z^2 + 76*Z^3 - 9*Z^4)/(288*x.n^2))
Expand Down
2 changes: 1 addition & 1 deletion src/jarque_bera.jl
Original file line number Diff line number Diff line change
Expand Up @@ -84,4 +84,4 @@ function show_params(io::IO, x::JarqueBeraTest, ident)
println(io, ident, "JB statistic: ", x.JB)
end

pvalue(x::JarqueBeraTest) = pvalue(Chisq(2), x.JB; tail=:right)
pvalue(x::JarqueBeraTest; tail=default_tail(x)) = pvalue(Chisq(2), x.JB; tail=tail)
2 changes: 1 addition & 1 deletion src/kruskal_wallis.jl
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function show_params(io::IO, x::KruskalWallisTest, ident)
println(io, ident, "adjustment for ties: ", x.tie_adjustment)
end

pvalue(x::KruskalWallisTest) = pvalue(Chisq(x.df), x.H; tail=:right)
pvalue(x::KruskalWallisTest; tail=default_tail(x)) = pvalue(Chisq(x.df), x.H; tail=tail)


## helper
Expand Down
34 changes: 22 additions & 12 deletions src/t.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@ pvalue(x::TTest; tail=:both) = pvalue(TDist(x.df), x.t; tail=tail)
default_tail(test::TTest) = :both

# confidence interval by inversion
function StatsBase.confint(x::TTest, alpha::Float64=0.05; tail=:both)
function StatsBase.confint(x::TTest, alpha::Float64=get_alpha(x); tail=get_tail(x))
check_alpha(alpha)

if tail == :left
(-Inf, StatsBase.confint(x, alpha*2)[2])
(-Inf, StatsBase.confint(x, alpha*2, tail=:both)[2]) # tail=:both required as recursive anchor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recursive anchor"? I don't understand what happens here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantiles are only computed if tail==:both. The recursion would be infinite if the argument tail=:both would not be provided because tail is set to the tail of x now, which may be one-sided.

The term "recursive anchor" is chosen because tail==:both will not allow infinite recursion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that the term is not obvious, at least not to me. Just say something about blocking infinite recursion.

elseif tail == :right
(StatsBase.confint(x, alpha*2)[1], Inf)
(StatsBase.confint(x, alpha*2, tail=:both)[1], Inf)
elseif tail == :both
q = quantile(TDist(x.df), 1-alpha/2)
(x.xbar-q*x.stderr, x.xbar+q*x.stderr)
Expand All @@ -58,6 +58,8 @@ immutable OneSampleTTest <: TTest
stderr::Real # empirical standard error
t::Real # t-statistic
μ0::Real # mean under h_0
tail::Symbol # :left, :right, or :both
alpha::Real # alpha value
end

testname(::OneSampleTTest) = "One sample t-test"
Expand All @@ -70,19 +72,21 @@ function show_params(io::IO, x::OneSampleTTest, ident="")
println(io, ident, "empirical standard error: $(x.stderr)")
end

function OneSampleTTest(xbar::Real, stddev::Real, n::Int, μ0::Real=0)
function OneSampleTTest(xbar::Real, stddev::Real, n::Int, μ0::Real=0; tail::Symbol=:both, alpha::Real=0.05)
stderr = stddev/sqrt(n)
t = (xbar-μ0)/stderr
df = n-1
OneSampleTTest(n, xbar, df, stderr, t, μ0)
OneSampleTTest(n, xbar, df, stderr, t, μ0, tail, alpha)
end

OneSampleTTest{T<:Real}(v::AbstractVector{T}, μ0::Real=0) = OneSampleTTest(mean(v), std(v), length(v), μ0)
OneSampleTTest{T<:Real}(v::AbstractVector{T}, μ0::Real=0; tail::Symbol=:both, alpha::Real=0.05) =
OneSampleTTest(mean(v), std(v), length(v), μ0, tail=tail, alpha=alpha)

function OneSampleTTest{T<:Real, S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0)
function OneSampleTTest{T<:Real, S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0;
tail::Symbol=:both, alpha::Real=0.05)
check_same_length(x, y)

OneSampleTTest(x - y, μ0)
OneSampleTTest(x - y, μ0, tail=tail, alpha=alpha)
end


Expand All @@ -96,6 +100,8 @@ immutable EqualVarianceTTest <: TwoSampleTTest
stderr::Real # empirical standard error
t::Real # t-statistic
μ0::Real # mean difference under h_0
tail::Symbol # :left, :right, or :both
alpha::Real # alpha value
end

function show_params(io::IO, x::TwoSampleTTest, ident="")
Expand All @@ -108,14 +114,15 @@ end
testname(::EqualVarianceTTest) = "Two sample t-test (equal variance)"
population_param_of_interest(x::TwoSampleTTest) = ("Mean difference", x.μ0, x.xbar) # parameter of interest: name, value under h0, point estimate

function EqualVarianceTTest{T<:Real,S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0)
function EqualVarianceTTest{T<:Real,S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0;
tail::Symbol=:both, alpha::Real=0.05)
nx, ny = length(x), length(y)
xbar = mean(x) - mean(y)
stddev = sqrt(((nx - 1) * var(x) + (ny - 1) * var(y)) / (nx + ny - 2))
stderr = stddev * sqrt(1/nx + 1/ny)
t = (xbar - μ0) / stderr
df = nx + ny - 2
EqualVarianceTTest(nx, ny, xbar, df, stderr, t, μ0)
EqualVarianceTTest(nx, ny, xbar, df, stderr, t, μ0, tail, alpha)
end


Expand All @@ -129,16 +136,19 @@ immutable UnequalVarianceTTest <: TwoSampleTTest
stderr::Real # empirical standard error
t::Real # t-statistic
μ0::Real # mean under h_0
tail::Symbol # :left, :right, or :both
alpha::Real # alpha value
end

testname(::UnequalVarianceTTest) = "Two sample t-test (unequal variance)"

function UnequalVarianceTTest{T<:Real,S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0)
function UnequalVarianceTTest{T<:Real,S<:Real}(x::AbstractVector{T}, y::AbstractVector{S}, μ0::Real=0;
tail::Symbol=:both, alpha::Real=0.05)
nx, ny = length(x), length(y)
xbar = mean(x)-mean(y)
varx, vary = var(x), var(y)
stderr = sqrt(varx/nx + vary/ny)
t = (xbar-μ0)/stderr
df = (varx / nx + vary / ny)^2 / ((varx / nx)^2 / (nx - 1) + (vary / ny)^2 / (ny - 1))
UnequalVarianceTTest(nx, ny, xbar, df, stderr, t, μ0)
UnequalVarianceTTest(nx, ny, xbar, df, stderr, t, μ0, tail, alpha)
end