diff --git a/Project.toml b/Project.toml index cdd3d94..0e4eda7 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "RegressionTables" uuid = "d519eb52-b820-54da-95a6-98e1306fdade" authors = ["Johannes Boehm "] -version = "0.6.2" +version = "0.7.0" [deps] Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" diff --git a/docs/src/api.md b/docs/src/api.md index d595e37..ed69d48 100644 --- a/docs/src/api.md +++ b/docs/src/api.md @@ -71,7 +71,6 @@ RegressionTables.reorder_nms_list RegressionTables.drop_names! RegressionTables.add_blank RegressionTables.missing_vars -RegressionTables.other_stats RegressionTables.add_element! ``` @@ -83,4 +82,25 @@ This section describes how different types are displayed. Throughout this packag ```@docs Base.repr +``` + +## New RegressionModel Types + +This package is designed to be generally compatible with the [RegressionModel abstraction](https://juliastats.org/StatsBase.jl/latest/statmodels/). It has special conditions defined around four commonly used packages ([FixedEffectModels.jl](https://github.com/matthieugomez/FixedEffectModels.jl), [GLM.jl](https://github.com/JuliaStats/GLM.jl), [GLFixedEffectModels.jl](https://github.com/jmboehm/GLFixedEffectModels.jl) and [MixedModels.jl](https://github.com/JuliaStats/MixedModels.jl)). It is possible to add new models to this list, either by creating an extension for this package or by creating the necessary items in an independent package. + +For any new `RegressionModel`, there may be a need to define the following functions for the package to work correctly. Many of these will work without any issues if following the StatsModels API, and many of the others are useful for customizing how the regression result is displayed. It is also possible to redefine how [`RegressionTables.AbstractRegressionStatistic`](@ref) are displayed. + +```@docs +RegressionTables._formula +RegressionTables._responsename +RegressionTables._coefnames +RegressionTables._coef +RegressionTables._stderror +RegressionTables._dof_residual +RegressionTables._pvalue +RegressionTables.other_stats +RegressionTables.default_regression_statistics(::RegressionModel) +RegressionTables.can_standardize +RegressionTables.standardize_coef_values +RegressionTables.RegressionType ``` \ No newline at end of file diff --git a/docs/src/customization.md b/docs/src/customization.md index 5d694c7..a24b286 100644 --- a/docs/src/customization.md +++ b/docs/src/customization.md @@ -66,6 +66,7 @@ RegressionTables.default_regression_statistics RegressionTables.default_print_randomeffects RegressionTables.default_print_clusters RegressionTables.default_use_relabeled_values +RegressionTables.default_confint_level ``` ## Other Defaults diff --git a/docs/src/examples.md b/docs/src/examples.md index ca3176e..538023c 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -176,6 +176,8 @@ Within-R2 0.642 0.598 0.391 ### ConfInt (Confidence Interval) +Confidence level defaults to the 95th percentile: + ```jldoctest regtable(rr1,rr2,rr3,rr4; below_statistic = ConfInt) @@ -209,6 +211,41 @@ Within-R2 0.642 0.598 ------------------------------------------------------------------------------------------------ ``` +Set the Confidence Interval level either by setting [`RegressionTables.default_confint_level`](@ref) or by adjusting the `confint_level` keyword argument + +```jldoctest +regtable(rr1,rr2,rr3,rr4; below_statistic = ConfInt, confint_level=0.9, align=:c) + +# output + + +------------------------------------------------------------------------------------------------- + SepalLength SepalWidth + --------------------------------------------------- ---------------- + (1) (2) (3) (4) +------------------------------------------------------------------------------------------------- +(Intercept) 6.526*** + (5.734, 7.319) +SepalWidth -0.223 0.432*** 0.516*** + (-0.480, 0.033) (0.297, 0.567) (0.345, 0.688) +PetalLength 0.776*** 0.723*** -0.188* + (0.669, 0.882) (0.510, 0.937) (-0.326, -0.049) +PetalWidth -0.625 0.626*** + (-1.211, -0.038) (0.421, 0.830) +PetalLength & PetalWidth 0.066 + (-0.045, 0.177) +SepalLength 0.378*** + (0.269, 0.486) +------------------------------------------------------------------------------------------------- +Species Fixed Effects Yes Yes Yes +isSmall Fixed Effects Yes +------------------------------------------------------------------------------------------------- +N 150 150 150 150 +R2 0.014 0.863 0.868 0.635 +Within-R2 0.642 0.598 0.391 +------------------------------------------------------------------------------------------------- +``` + ## Standard Errors on same line as coefficient ```jldoctest @@ -1240,6 +1277,39 @@ Pseudo R2 0.006 0.811 0.347 0.297 ------------------------------------------------------------------ ``` +It is also possible to standardize some coefficients and not others + +```jldoctest +lm1 = lm(@formula(SepalLength ~ SepalWidth), df); +regtable(lm1, lm1, rr7, rr7; standardize_coef=[false, true, false, true]) + +# output + + +--------------------------------------------------------- + SepalLength isSmall + ------------------- --------------------- + (1) (2) (3) (4) +--------------------------------------------------------- +(Intercept) 6.526*** 7.881*** 10.189*** 21.894*** + (0.479) (0.578) (2.607) (5.601) +SepalWidth -0.223 -0.118 + (0.155) (0.082) +SepalLength -3.519*** -6.260*** + (0.697) (1.240) +PetalLength 3.580*** 13.578*** + (0.708) (2.686) +PetalWidth -3.637** -5.957** + (1.127) (1.846) +--------------------------------------------------------- +Estimator OLS OLS Binomial Binomial +--------------------------------------------------------- +N 150 150 150 150 +R2 0.014 0.014 +Pseudo R2 0.006 0.006 0.297 0.297 +--------------------------------------------------------- +``` + ## Show Clustered Standard Errors Displays whether or not the standard errors are clustered and in what ways. diff --git a/docs/src/regression_statistics.md b/docs/src/regression_statistics.md index fee6f37..4c3d1ff 100644 --- a/docs/src/regression_statistics.md +++ b/docs/src/regression_statistics.md @@ -28,7 +28,6 @@ Filter = t -> typeof(t) === DataType && t <: RegressionTables.AbstractUnderStati ```@docs RegressionTables.CoefValue -RegressionTables.RegressionType RegressionTables.HasControls RegressionTables.RegressionNumbers RegressionTables.FixedEffectValue diff --git a/ext/RegressionTablesGLMExt.jl b/ext/RegressionTablesGLMExt.jl index e8b1b60..97d4eda 100644 --- a/ext/RegressionTablesGLMExt.jl +++ b/ext/RegressionTablesGLMExt.jl @@ -7,8 +7,12 @@ RegressionTables.default_regression_statistics(rr::StatsModels.TableRegressionMo RegressionTables.RegressionType(x::StatsModels.TableRegressionModel{T}) where {T<:GLM.AbstractGLM} = RegressionType(x.model) RegressionTables.RegressionType(x::StatsModels.TableRegressionModel{T}) where {T<:LinearModel} = RegressionType(x.model) -RegressionTables.standardize_coef_values(x::StatsModels.TableRegressionModel, coefvalues, coefstderrors) = - RegressionTables.standardize_coef_values(std(modelmatrix(x), dims=1)[1, :], std(response(x)), coefvalues, coefstderrors) + +# k is which coefficient or standard error to standardize +RegressionTables.standardize_coef_values(x::StatsModels.TableRegressionModel, val, k) = + RegressionTables.standardize_coef_values(std(modelmatrix(x)[:, k]), std(response(x)), val) + +RegressionTables.can_standardize(x::StatsModels.TableRegressionModel) = true RegressionTables.RegressionType(x::LinearModel) = RegressionType(Normal()) RegressionTables.RegressionType(x::GLM.LmResp) = RegressionType(Normal()) diff --git a/ext/RegressionTablesMixedModelsExt.jl b/ext/RegressionTablesMixedModelsExt.jl index c3a44f2..d340436 100644 --- a/ext/RegressionTablesMixedModelsExt.jl +++ b/ext/RegressionTablesMixedModelsExt.jl @@ -26,8 +26,11 @@ function RegressionTables._coefnames(x::MixedModel) out end -RegressionTables.standardize_coef_values(x::MixedModel, coefvalues, coefstderrors) = - RegressionTables.standardize_coef_values(std(modelmatrix(x), dims=1)[1, :], std(response(x)), coefvalues, coefstderrors) +# k is which coefficient or standard error to standardize +RegressionTables.standardize_coef_values(x::MixedModel, val, k) = + RegressionTables.standardize_coef_values(std(modelmatrix(x)[:, k]), std(response(x)), val) + +RegressionTables.can_standardize(x::MixedModel) = true function RegressionTables.other_stats(x::MixedModel, s::Symbol) if s == :randomeffects diff --git a/src/RegressionStatistics.jl b/src/RegressionStatistics.jl index 223f366..e001156 100644 --- a/src/RegressionStatistics.jl +++ b/src/RegressionStatistics.jl @@ -462,44 +462,57 @@ abstract type AbstractUnderStatistic <: AbstractRegressionData end struct TStat <: AbstractUnderStatistic val::Float64 end - TStat(se, coef, dof=0) + TStat(rr::RegressionModel, k::Int; vargs...) The t-statistic of a coefficient. """ struct TStat <: AbstractUnderStatistic val::Float64 end -TStat(se, coef, dof=0) = TStat(coef / se) +TStat(rr::RegressionModel, k::Int; vargs...) = TStat(_coef(rr)[k] / _stderror(rr)[k]) """ struct StdError <: AbstractUnderStatistic val::Float64 end - StdError(se, coef, dof=0) + StdError(rr::RegressionModel, k::Int; standardize=false, vargs...) The standard error of a coefficient. """ struct StdError <: AbstractUnderStatistic val::Float64 end -StdError(se, coef, dof=0) = StdError(se) +function StdError(rr::RegressionModel, k::Int; standardize=false, vargs...) + if standardize + StdError(standardize_coef_values(rr, _stderror(rr)[k], k)) + else + StdError(_stderror(rr)[k]) + end +end """ struct ConfInt <: AbstractUnderStatistic val::Tuple{Float64, Float64} end - ConfInt(se, coef, dof; level=default_confint_level()) + ConfInt(rr::RegressionModel, k::Int; level=0.95, standardize=false, vargs...) The confidence interval of a coefficient. The default confidence level is 95% (can be changed by setting -`RegressionTable.default_confint_level() = 0.90` or similar). +`RegressionTable.default_confint_level(render::AbstractRenderType, rr) = 0.90` or similar). """ struct ConfInt <: AbstractUnderStatistic val::Tuple{Float64, Float64} end -default_confint_level() = 0.95 -function ConfInt(se, coef, dof; level=default_confint_level()) + +function ConfInt(rr::RegressionModel, k::Int; level=0.95, standardize=false, vargs...) @assert 0 < level < 1 "Confidence level must be between 0 and 1" + se = _stderror(rr)[k] + coef = _coef(rr)[k] + dof = _dof_residual(rr) + if standardize + se = standardize_coef_values(rr, se, k) + coef = standardize_coef_values(rr, coef, k) + end scale = quantile(TDist(dof), 1 - (1-level) / 2) ConfInt((coef - scale * se, coef + scale * se)) end @@ -519,6 +532,14 @@ struct CoefValue <: AbstractRegressionData val::Float64 pvalue::Float64 end +function CoefValue(rr::RegressionModel, k::Int; standardize=false, vargs...) + val = _coef(rr)[k] + p = _pvalue(rr)[k] + if standardize + val = standardize_coef_values(rr, val, k) + end + CoefValue(val, p) +end value(x::CoefValue) = x.val value_pvalue(x::CoefValue) = x.pvalue value_pvalue(x::Missing) = missing diff --git a/src/regressionResults.jl b/src/regressionResults.jl index 3215a20..d77bfdd 100644 --- a/src/regressionResults.jl +++ b/src/regressionResults.jl @@ -4,7 +4,26 @@ These are the necessary functions to create a table from a regression result. If the regression result does not provide a function by default, then within an extension, it is possible to define the necessary function. =# + +""" + _formula(x::RegressionModel) + +Generally a passthrough for the `formula` function from the `StatsModels` package. +Note tha the `formula` function returns the `FormulaSchema`. + +This function is only used internally in the [`RegressionTables._responsename`](@ref) +and [`RegressionTables._coefnames`](@ref) functions. Therefore, if the `RegressionModel` +uses those two functions without using `formula`, this function is not necessary. +""" _formula(x::RegressionModel) = formula(x) + +""" + _responsename(x::RegressionModel) + +Returns the name of the dependent variable in the regression model. +The default of this returns a `AbstractCoefName` object, but it can be +a `String` or `Symbol` as well. +""" function _responsename(x::RegressionModel) x = get_coefname(_formula(x).lhs) if isa(x, AbstractVector) @@ -12,6 +31,14 @@ function _responsename(x::RegressionModel) end x end + +""" + _coefnames(x::RegressionModel) + +Returns a vector of the names of the coefficients in the regression model. +The default of this returns a vector of `AbstractCoefName` objects, but it can be +a vector of `String` or `Symbol` as well. +""" function _coefnames(x::RegressionModel) out = get_coefname(_formula(x).rhs) if !isa(out, AbstractVector) @@ -19,25 +46,85 @@ function _coefnames(x::RegressionModel) end out end + +""" + _coef(x::RegressionModel) + +Returns a vector of the coefficients in the regression model. +By default, is just a passthrough for the `coef` function from the `StatsModels` package. +""" _coef(x::RegressionModel) = coef(x) + +""" + _stderror(x::RegressionModel) + +Returns a vector of the standard errors of the coefficients in the regression model. +By default, is just a passthrough for the `stderror` function from the `StatsModels` package. +""" _stderror(x::RegressionModel) = stderror(x) + +""" + _dof_residual(x::RegressionModel) + +Returns the degrees of freedom of the residuals in the regression model. +By default, is just a passthrough for the `dof_residual` function from the `StatsModels` package. +""" _dof_residual(x::RegressionModel) = dof_residual(x) +""" + _pvalue(x::RegressionModel) + +Returns a vector of the p-values of the coefficients in the regression model. +""" function _pvalue(x::RegressionModel) tt = _coef(x) ./ _stderror(x) ccdf.(Ref(FDist(1, _dof_residual(x))), abs2.(tt)) end -function standardize_coef_values(rr::T, coefvalues, coefstderrors) where {T <: RegressionModel} +""" + can_standardize(x::RegressionModel) + +Returns a boolean indicating whether the coefficients can be standardized. +standardized coefficients are coefficients that are scaled by the standard deviation of the +variables. This is useful for comparing the relative importance of the variables in the model. + +This is only possible of the `RegressionModel` includes the model matrix or the +standard deviation of the dependent variable. If the `RegressionModel` does not include +either of these, then this function should return `false`. + +See also [`RegressionTables.standardize_coef_values`](@ref). +""" +function can_standardize(x::T) where {T<:RegressionModel} @warn "standardize_coef is not possible for $T" - coefvalues, coefstderrors + false end -function standardize_coef_values(std_X::Vector, std_Y, coefvalues::Vector, coefstderrors::Vector) - std_X = replace(std_X, 0 => 1) # constant has 0 std, so the interpretation is how many Y std away from 0 is the intercept - coefvalues = coefvalues .* std_X ./ std_Y - coefstderrors = coefstderrors .* std_X ./ std_Y - coefvalues, coefstderrors +""" + standardize_coef_values(std_X, std_Y, val) + +Standardizes the coefficients by the standard deviation of the variables. +This is useful for comparing the relative importance of the variables in the model. + +This function is only used if the [`RegressionTables.can_standardize`](@ref) function returns `true`. + +### Arguments +- `std_X::Real`: The standard deviation of the independent variable. +- `std_Y::Real`: The standard deviation of the dependent variable. +- `val::Real`: The value to be standardized (either the coefficient or the standard error). + +!!! note + If the standard deviation of the independent variable is 0, then the interpretation of the + coefficient is how many standard deviations of the dependent variable away from 0 is the intercept. + In this case, the function returns `val / std_Y`. + + Otherwise, the function returns `val * std_X / std_Y`. +""" +function standardize_coef_values(std_X, std_Y, val) + if std_X == 0 # constant has 0 std, so the interpretation is how many Y std away from 0 is the intercept + val / std_Y + else + val * std_X / std_Y + end end transformer(s::Nothing, repl_dict::AbstractDict) = s @@ -59,6 +146,13 @@ make_reg_stats(rr, stat) = stat make_reg_stats(rr, stat::Pair{<:Any, <:AbstractString}) = make_reg_stats(rr, first(stat)) => last(stat) default_regression_statistics(x::AbstractRenderType, rr::RegressionModel) = default_regression_statistics(rr) +""" + default_regression_statistics(rr::RegressionModel) + +Returns a vector of [`AbstractRegressionStatistic`](@ref) objects. This is used to display the +statistics in the table. This is customizable for each `RegressionModel` type. The default +is to return a vector of `Nobs` and `R2`. +""" default_regression_statistics(rr::RegressionModel) = [Nobs, R2] diff --git a/src/regtable.jl b/src/regtable.jl index 4e84758..1de9542 100644 --- a/src/regtable.jl +++ b/src/regtable.jl @@ -300,13 +300,22 @@ default_print_clusters(render::AbstractRenderType, rrs) = false """ default_regression_statistics(render::AbstractRenderType, rrs) -Defaults to a union of the default_regression_statistics for each regression. +Defaults to a union of the `default_regression_statistics` for each regression. For example, an "OLS" regression (with no fixed effects) will default to including `[Nobs, R2]`, and a Probit regression will include `[Nobs, PseudoR2]`, so the default will be `[Nobs, R2, PseudoR2]`. """ default_regression_statistics(render::AbstractRenderType, rrs::Tuple) = unique(union(default_regression_statistics.(render, rrs)...)) +""" + default_confint_level(render::AbstractRenderType, rrs) + +Defaults to `0.95`, which means the 95% confidence interval is printed below the coefficient. +""" +default_confint_level(render::AbstractRenderType, rrs) = default_confint_level() +default_confint_level() = 0.95 # to maintain better backwards compatibility with v0.6.x + + """ default_use_relabeled_values(render::AbstractRenderType, rrs) = true @@ -347,6 +356,7 @@ Produces a publication-quality regression table, similar to Stata's `esttab` and * `render::AbstractRenderType` is a `AbstractRenderType` type that governs how the table should be rendered. Standard supported types are ASCII (via `AsciiTable()`) and LaTeX (via `LatexTable()`). Defaults to `AsciiTable()`. * `file` is a `String` that governs whether the table should be saved to a file. Defaults to `nothing`. * `transform_labels` is a `Dict` or one of the `Symbol`s `:ampersand`, `:underscore`, `:underscore2space`, `:latex` +* `confint_level` is a `Float64` that governs the confidence level for the confidence interval. Defaults to `0.95`. ### Details A typical use is to pass a number of `FixedEffectModel`s to the function, along with how it should be rendered (with `render` argument): @@ -397,6 +407,7 @@ function regtable( estim_decoration::Union{Nothing, Function}=nothing, regressors=nothing, use_relabeled_values=default_use_relabeled_values(render, rrs), + confint_level=default_confint_level(render, rrs), kwargs... ) where {T<:AbstractRenderType} @assert align ∈ (:l, :r, :c) "align must be one of :l, :r, :c" @@ -407,6 +418,12 @@ function regtable( if isa(standardize_coef, Bool) standardize_coef = fill(standardize_coef, length(rrs)) end + if !isa(confint_level, AbstractVector) + confint_level = fill(confint_level, length(rrs)) + end + for (i, rr) in enumerate(rrs) + standardize_coef[i] = standardize_coef[i] && can_standardize(rr) + end if regressors !== nothing @warn("regressors is deprecated. Use keep instead.") base_names = union(coefnames.(rrs)...) @@ -524,19 +541,13 @@ function regtable( coefbelow = Matrix{Any}(missing, length(nms), length(rrs)) for (i, rr) in enumerate(rrs) cur_nms = replace_name.(_coefnames(rr), Ref(labels), Ref(transform_labels)) - cur_coef = _coef(rr) - cur_coefpvalues = _pvalue(rr) - cur_stderror = _stderror(rr) - if standardize_coef[i] - cur_coef, cur_stderror = standardize_coef_values(rr, cur_coef, cur_stderror) - end for (j, nm) in enumerate(nms) k = findfirst(cur_nms .== nm) k === nothing && continue - coefvalues[j, i] = CoefValue(cur_coef[k], cur_coefpvalues[k]) + coefvalues[j, i] = CoefValue(rr, k; standardize=standardize_coef[i]) if below_statistic !== nothing - coefbelow[j, i] = below_statistic(cur_stderror[k], cur_coef[k], _dof_residual(rr)) + coefbelow[j, i] = below_statistic(rr, k; standardize=standardize_coef[i], level=confint_level[i]) end end end