Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't need map() for summarizing models #33

Open
Aariq opened this issue Apr 7, 2022 · 2 comments
Open

Don't need map() for summarizing models #33

Aariq opened this issue Apr 7, 2022 · 2 comments

Comments

@Aariq
Copy link

Aariq commented Apr 7, 2022

mtcars %>%

This works fine:

mtcars %>% 
  group_by(cyl) %>% 
  summarize(r.sq = summary(lm(mpg ~ wt))$r.squared)

The problem here is not having to learn new paradigms just to do it, it's that you can't easily save intermediate steps because summarize wants the right hand side to be a vector, not a model object.

For example, the following code errors:

mtcars %>% 
  group_by(cyl) %>% 
  summarize(m = lm(mpg ~ wt))

And to get it to work, you have to start dealing with list-columns, which is a whole thing:

#this works, but makes a data frame with a list-column
mtcars %>% 
  group_by(cyl) %>% 
  summarize(m = list(lm(mpg ~ wt)))
@sda030
Copy link

sda030 commented Aug 6, 2022

This is to me beautiful, simple and logical. Split the dataset by cyl. For each part, run the following functions: run model, extract model fit, and finally bind it all together, while preserving name of splitted variable in output. Oh, and all the output one could need.

library(dplyr)
library(broom)
mtcars %>% 
    group_by(cyl) %>%
    group_map(.f=~lm(mpg ~ wt, data=.x) %>% glance()) %>%
    bind_rows(.id = "cyl")
#> # A tibble: 3 × 13
#>   cyl   r.squared adj.r…¹ sigma stati…² p.value    df logLik   AIC   BIC devia…³
#>   <chr>     <dbl>   <dbl> <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 1         0.509   0.454  3.33    9.32  0.0137     1 -27.7   61.5  62.7   99.9 
#> 2 2         0.465   0.357  1.17    4.34  0.0918     1  -9.83  25.7  25.5    6.79
#> 3 3         0.423   0.375  2.02    8.80  0.0118     1 -28.7   63.3  65.2   49.2 
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> #   variable names ¹​adj.r.squared, ²​statistic, ³​deviance

Created on 2022-08-06 by the reprex package (v2.0.1)

@matloff
Copy link
Owner

matloff commented Mar 11, 2023

A point I make in the essay that intermediate steps are GOOD for beginning coders, the group who my essay focuses on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants