forked from UBC-MDS/sptidy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
80 lines (52 loc) · 3.67 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sptidy
<!-- badges: start -->
<!-- badges: end -->
An R package that produces a tidy output for tidymodels model evaluation!
## Introduction
Sptidy implements a `tidy` and `augment` function for Tidymodel’s linear regression and kmeans clustering to ease model selection and assessment tasks. This package is a simplified reimplementation of the existing `tidy` and `augment` functions in the Broom package. Sptidy’s family of tidy functions returns a dataframe that summarizes important model information, while the augment function expands the original dataframe to include additional model specific information by observation. This package is meant to complement [Sktidy](https://github.com/UBC-MDS/sktidy), a Python package that was created to tidy up the scikit-learn package.
## Features
The functions that this package currently support include:
- `tidy_kmeans()`: Returns inertia, cluster location, and number of associated points at the level of clusters in a tidy format.
- `tidy_lr()`: Returns coefficients and corresponding feature names in a tidy format.
- `augment_lr()` : Returns predictions and residuals for each point in the training data set in a tidy format.
- `augment_kmeans()` : Returns assigned cluster and distance from cluster center for the data the kmeans algorithm was fitted with in a tidy format.
## How sptidy fits into the R ecosystem
[Tidymodels](https://github.com/tidymodels) is a “meta-package” for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse. One of the packages it includes is [broom](https://broom.tidymodels.org/) which takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames. The tidy data refers to outputting the results in a `data.frame` where each variable has its own column, each observation has its own row, and each value has its own cell. In `sptidy`, we implement the functions `tidy()` and `augment()` for the linear regression model from the `parsnip` package which is included in the `tidymodels` package using the function [`linear_reg()`](https://www.rdocumentation.org/packages/parsnip/versions/0.0.0.9001/topics/linear_reg) and the KMeans model from R `stats` package using the function [`kmeans()`](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans).
## Installation
You can install the released version of sptidy from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("sptidy")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("JacobMcFarlane/sptidy")
```
## Example
This is a basic example which shows you how to solve a common problem:
```{r example}
#library(sptidy)
## basic example code
```
What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
```{r cars}
summary(cars)
```
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. You could also use GitHub Actions to re-render `README.Rmd` every time you push. An example workflow can be found here: <https://github.com/r-lib/actions/tree/master/examples>.
You can also embed plots, for example:
```{r pressure, echo = FALSE}
plot(pressure)
```
In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.