infer lm_test #547

bansell · 2024-10-29T23:53:19Z

title: "infer lm_test"
output: html_document
date: "2024-10-29"

Hi Maintainers,

I am developing a graduate teaching course including basic linear modeling, and would like to keep all of the material in tidy format with dplyr-style piped chains.

Tricky lm syntax in broom

Broom is a great package but requires some 'one-off' counter-intuitive code to tidy lm results 'in-line' e.g.:\

library(ggplot2)
library(dplyr)
library(broom)
library(infer)

gss %>% do(tidy(lm(hours ~ age + college, .)))

Specifically the do(tidy(lm( response ~ predictor, . ))) will be a cognitive hurdle for learners who will have only had exposure to ggplot and dplyr.

Tidy lm summary for infer?

It would be great to implement this functionality in a chain-able tidy format, and infer would seem to be a good place for this.\

As I understand it the focus of the infer package is to allow permutation/bootstrap-based tests, although the package also includes a wrapper for t.test() (and mention of a future aov() wrapper?), which use theoretical null distributions.
https://infer.tidymodels.org/articles/t_test.html

Adding wrappers / chaining functions for aov() and lm() would be extremely useful for teaching statistics via tidy R. Is this within the scope of infer?

https://infer.tidymodels.org/reference/fit.infer.html?q=multiva#ref-examples

In the tutorials we get as far as fitting a model using infer functions:

gss %>%
  specify(hours ~ age + college) %>%
  fit()

Would it be possible to have get_p_value() working in-line e.g.:

gss %>%
  specify(hours ~ age + college) %>%
  fit() |> 
  get_p_value()

or, tidy_summary() which would reproduce the complete broom::tidy() output above?

lm_test() alternative

I read that group_by() is not yet implemented in infer. I've drafted an analagous function for lm_test() which can handle grouped input:


lm_test <- function(input_data, formula){
  
  res <- input_data |> 
    do(broom::tidy(lm(formula, data = .))) |> 
   dplyr::mutate(term = ifelse(term=='(Intercept)', 'intercept', term)) |> 
   dplyr::arrange(p.value)
  
  return(res)
                   }

gss |> lm_test(hours ~ age + college)

gss |> group_by(class) |> lm_test(hours ~ age + college)

Please let us know if its feasible to include this in infer.
Thanks!

The text was updated successfully, but these errors were encountered:

simonpcouch · 2024-11-07T20:30:09Z

Regarding broom's functionality, I do think explicit iteration over tidy() is a reasonable interface and I'm not sure that a wrapper from infer is in scope for the package.

Regarding p-values for regression coefficients using infer, you can indeed use get_p_value() for observed fits with the package already, but you will also need a distribution of "null fits" generated using randomization:

library(infer)
  
null_fits <- gss %>%
  specify(college ~ age + hours) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  fit()

observed_fit <- gss %>%
  specify(college ~ age + hours) %>%
  fit()

get_p_value(null_fits, observed_fit, direction = "both")
#> # A tibble: 3 × 2
#>   term      p_value
#>   <chr>       <dbl>
#> 1 age          0.54
#> 2 hours        0.3 
#> 3 intercept    0.34

^{Created on 2024-11-07 with reprex v2.1.1}

Thanks for the issue!

github-actions · 2024-11-22T01:08:22Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch closed this as completed Nov 7, 2024

github-actions bot locked and limited conversation to collaborators Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infer lm_test #547

infer lm_test #547

bansell commented Oct 29, 2024

simonpcouch commented Nov 7, 2024

github-actions bot commented Nov 22, 2024

infer lm_test #547

infer lm_test #547

Comments

bansell commented Oct 29, 2024

title: "infer lm_test" output: html_document date: "2024-10-29"

Tricky lm syntax in broom

Tidy lm summary for infer?

lm_test() alternative

simonpcouch commented Nov 7, 2024

github-actions bot commented Nov 22, 2024

title: "infer lm_test"
output: html_document
date: "2024-10-29"