-
Notifications
You must be signed in to change notification settings - Fork 31
/
README.Rmd
129 lines (90 loc) · 5.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
output: github_document
---
# tidypredict <img src="man/figures/logo.png" align="right" width = "120px"/>
[![R-CMD-check](https://github.com/tidymodels/tidypredict/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/tidypredict/actions)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidypredict)](https://CRAN.r-project.org/package=tidypredict)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/tidypredict/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidypredict?branch=main)
```{r pre, include = FALSE}
if (!rlang::is_installed("randomForest")) {
knitr::opts_chunk$set(
eval = FALSE
)
}
```
```{r setup, include=FALSE}
library(dplyr)
library(tidypredict)
library(randomForest)
```
The main goal of `tidypredict` is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:
```{r}
model <- lm(mpg ~ wt + cyl, data = mtcars)
```
`tidypredict` can return a SQL statement that is ready to run inside the database. Because it uses `dplyr`'s database interface, it works with several databases back-ends, such as MS SQL:
```{r}
tidypredict_sql(model, dbplyr::simulate_mssql())
```
## Installation
Install `tidypredict` from CRAN using:
```{r, eval = FALSE}
install.packages("tidypredict")
```
Or install the **development version** using `devtools` as follows:
```{r, eval = FALSE}
install.packages("remotes")
remotes::install_github("tidymodels/tidypredict")
```
## Functions
`tidypredict` has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.
| Function | Description
|-----------------------------|--------------------------------------------------------------------------------|
|`tidypredict_fit()` | Returns an R formula that calculates the prediction |
|`tidypredict_sql()` | Returns a SQL query based on the formula from `tidypredict_fit()` |
|`tidypredict_to_column()` | Adds a new column using the formula from `tidypredict_fit()` |
|`tidypredict_test()` | Tests `tidyverse` predictions against the model's native `predict()` function |
|`tidypredict_interval()` | Same as `tidypredict_fit()` but for intervals (only works with `lm` and `glm`) |
|`tidypredict_sql_interval()` | Same as `tidypredict_sql()` but for intervals (only works with `lm` and `glm`) |
|`parse_model()` | Creates a list spec based on the R model |
|`as_parsed_model()` | Prepares an object to be recognized as a parsed model |
## How it works
<img src="man/figures/howitworks.png">
Instead of translating directly to a SQL statement, `tidypredict` creates an R formula. That formula can then be used inside `dplyr`. The overall workflow would be as illustrated in the image above, and described here:
1. Fit the model using a base R model, or one from the packages listed in [Supported Models](#supported-models)
1. `tidypredict` reads model, and creates a list object with the necessary components to run predictions
1. `tidypredict` builds an R formula based on the list object
1. `dplyr` evaluates the formula created by `tidypredict`
1. `dplyr` translates the formula into a SQL statement, or any other interfaces.
1. The database executes the SQL statement(s) created by `dplyr`
### Parsed model spec
`tidypredict` writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:
1. No more saving models as `.rds` - Specifically for cases when the model needs to be used for predictions in a Shiny app.
1. Beyond R models - Technically, anything that can write a proper spec, can be read into `tidypredict`. It also means, that the parsed model spec can become a good alternative to using *PMML.*
## Supported models
The following models are supported by `tidypredict`:
- Linear Regression - `lm()`
- Generalized Linear model - `glm()`
- Random Forest models - `randomForest::randomForest()`
- Random Forest models, via `ranger` - `ranger::ranger()`
- MARS models - `earth::earth()`
- XGBoost models - `xgboost::xgb.Booster.complete()`
- Cubist models - `Cubist::cubist()`
- Tree models, via `partykit` - `partykit::ctree()`
### `parsnip`
`tidypredict` supports models fitted via the `parsnip` interface. The ones confirmed currently work in `tidypredict` are:
- `lm()` - `parsnip`: `linear_reg()` with *"lm"* as the engine.
- `randomForest::randomForest()` - `parsnip`: `rand_forest()` with *"randomForest"* as the engine.
- `ranger::ranger()` - `parsnip`: `rand_forest()` with *"ranger"* as the engine.
- `earth::earth()` - `parsnip`: `mars()` with *"earth"* as the engine.
### `broom`
The `tidy()` function from broom works with linear models parsed via `tidypredict`
```{r}
pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)
```
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidypredict/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).