Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infer lm_test #547

Closed
bansell opened this issue Oct 29, 2024 · 2 comments
Closed

infer lm_test #547

bansell opened this issue Oct 29, 2024 · 2 comments

Comments

@bansell
Copy link

bansell commented Oct 29, 2024


title: "infer lm_test"
output: html_document
date: "2024-10-29"

Hi Maintainers,

I am developing a graduate teaching course including basic linear modeling, and would like to keep all of the material in tidy format with dplyr-style piped chains.

Tricky lm syntax in broom

Broom is a great package but requires some 'one-off' counter-intuitive code to tidy lm results 'in-line' e.g.:\

library(ggplot2)
library(dplyr)
library(broom)
library(infer)

gss %>% do(tidy(lm(hours ~ age + college, .)))

Specifically the do(tidy(lm( response ~ predictor, . ))) will be a cognitive hurdle for learners who will have only had exposure to ggplot and dplyr.

Tidy lm summary for infer?

It would be great to implement this functionality in a chain-able tidy format, and infer would seem to be a good place for this.\

As I understand it the focus of the infer package is to allow permutation/bootstrap-based tests, although the package also includes a wrapper for t.test() (and mention of a future aov() wrapper?), which use theoretical null distributions.
https://infer.tidymodels.org/articles/t_test.html

Adding wrappers / chaining functions for aov() and lm() would be extremely useful for teaching statistics via tidy R. Is this within the scope of infer?

https://infer.tidymodels.org/reference/fit.infer.html?q=multiva#ref-examples

In the tutorials we get as far as fitting a model using infer functions:

gss %>%
  specify(hours ~ age + college) %>%
  fit() 

Would it be possible to have get_p_value() working in-line e.g.:

gss %>%
  specify(hours ~ age + college) %>%
  fit() |> 
  get_p_value()

or, tidy_summary() which would reproduce the complete broom::tidy() output above?

lm_test() alternative

I read that group_by() is not yet implemented in infer. I've drafted an analagous function for lm_test() which can handle grouped input:


lm_test <- function(input_data, formula){
  
  res <- input_data |> 
    do(broom::tidy(lm(formula, data = .))) |> 
   dplyr::mutate(term = ifelse(term=='(Intercept)', 'intercept', term)) |> 
   dplyr::arrange(p.value)
  
  return(res)
                   }


gss |> lm_test(hours ~ age + college)
gss |> group_by(class) |> lm_test(hours ~ age + college)

Please let us know if its feasible to include this in infer.
Thanks!

@simonpcouch
Copy link
Collaborator

Regarding broom's functionality, I do think explicit iteration over tidy() is a reasonable interface and I'm not sure that a wrapper from infer is in scope for the package.

Regarding p-values for regression coefficients using infer, you can indeed use get_p_value() for observed fits with the package already, but you will also need a distribution of "null fits" generated using randomization:

library(infer)
  
null_fits <- gss %>%
  specify(college ~ age + hours) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  fit()

observed_fit <- gss %>%
  specify(college ~ age + hours) %>%
  fit()

get_p_value(null_fits, observed_fit, direction = "both")
#> # A tibble: 3 × 2
#>   term      p_value
#>   <chr>       <dbl>
#> 1 age          0.54
#> 2 hours        0.3 
#> 3 intercept    0.34

Created on 2024-11-07 with reprex v2.1.1

Thanks for the issue!

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants