Hadley's dplyr
provides a grammar to talk about data manipulation and another his package, tidyr
provides a mindset to think about data. These two tools really makes it a lot easier to perform data manipulation today. This package ezsummary
packed up some commonly used dplyr
and tidyr
steps to generate data summarization to help you save some typing time. It also comes with some table decoration tolls that basically allows you to pipe the results directly into a table generating function like knitr::kable()
to render out.
For example, if you only use dplyr
and tidyr
to generate a statistical summary table by group. You need to go through the following steps.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
mtcars %>%
select(cyl, mpg, wt, hp) %>%
group_by(cyl) %>%
summarize_each(funs(mean, sd)) %>%
gather(variable, value, -cyl) %>%
mutate(value = round(value, 3)) %>%
separate(variable, into = c("variable", "analysis")) %>%
spread(analysis, value) %>%
mutate(variable = factor(variable, levels = c("mpg", "wt", "hp"))) %>%
arrange(variable, cyl) %>%
kable()
cyl | variable | mean | sd |
---|---|---|---|
4 | mpg | 26.664 | 4.510 |
6 | mpg | 19.743 | 1.454 |
8 | mpg | 15.100 | 2.560 |
4 | wt | 2.286 | 0.570 |
6 | wt | 3.117 | 0.356 |
8 | wt | 3.999 | 0.759 |
4 | hp | 82.636 | 20.935 |
6 | hp | 122.286 | 24.260 |
8 | hp | 209.214 | 50.977 |
For people who are familar with "tidyverse", I'm sure the above codes are very straightforward. However, it's a bit annoying to type it again and again. With ezsummary
, you don't need to think too much about it. You can just type:
library(ezsummary)
mtcars %>%
select(cyl, mpg, wt, hp) %>%
group_by(cyl) %>%
ezsummary() %>%
kable()
cyl | variable | mean | sd |
---|---|---|---|
4 | mpg | 26.664 | 4.510 |
6 | mpg | 19.743 | 1.454 |
8 | mpg | 15.100 | 2.560 |
4 | wt | 2.286 | 0.570 |
6 | wt | 3.117 | 0.356 |
8 | wt | 3.999 | 0.759 |
4 | hp | 82.636 | 20.935 |
6 | hp | 122.286 | 24.260 |
8 | hp | 209.214 | 50.977 |
install.packages("ezsummary")
Or
install.packages("devtools")
devtools::install_github("haozhu233/ezsummary")
Here, I will show another quick demo of how to use this package here. For detailed package documentation, please check the package vignette.
library(dplyr)
library(ezsummary)
mtcars %>%
# q: quantitative/continuous variables; c: categorical variables
var_types("qcqqqqqcccc") %>%
group_by(am) %>%
ezsummary(flavor = "wide", unit_markup = "[. (.)]",
digits = 1, p_type = "percent") %>%
kable(col.names = c("variable", "Manual", "Automatic"))
variable | Manual | Automatic |
---|---|---|
mpg | 17.1 (3.8) | 24.4 (6.2) |
cyl_4 | 3 (15.8%) | 8 (61.5%) |
cyl_6 | 4 (21.1%) | 3 (23.1%) |
cyl_8 | 12 (63.2%) | 2 (15.4%) |
disp | 290.4 (110.2) | 143.5 (87.2) |
hp | 160.3 (53.9) | 126.8 (84.1) |
drat | 3.3 (0.4) | 4 (0.4) |
wt | 3.8 (0.8) | 2.4 (0.6) |
qsec | 18.2 (1.8) | 17.4 (1.8) |
vs_0 | 12 (63.2%) | 6 (46.2%) |
vs_1 | 7 (36.8%) | 7 (53.8%) |
gear_3 | 15 (78.9%) | 0 (0) |
gear_4 | 4 (21.1%) | 8 (61.5%) |
gear_5 | 0 (0) | 5 (38.5%) |
carb_1 | 3 (15.8%) | 4 (30.8%) |
carb_2 | 6 (31.6%) | 4 (30.8%) |
carb_3 | 3 (15.8%) | 0 (0) |
carb_4 | 7 (36.8%) | 3 (23.1%) |
carb_6 | 0 (0) | 1 (7.7%) |
carb_8 | 0 (0) | 1 (7.7%) |
If you ever find any issues, please feel free to report it in the issues tracking part on github. https://github.com/haozhu233/simple.summary/issues.
Thanks for using this package!