Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Candidate 1.1.4 #220

Merged
merged 6 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: embed
Title: Extra Recipes for Encoding Predictors
Version: 1.1.3.9000
Version: 1.1.4.9000
Authors@R: c(
person("Emil", "Hvitfeldt", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-0679-1945")),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# embed (development version)

# embed 1.1.4

## Improvements

* `step_umap()` has gained `initial` and `target_weight` arguments. (#213)

* Calling `?tidy.step_*()` now sends you to the documentation for `step_*()` where the outcome is documented. (#216)
Expand Down
6 changes: 3 additions & 3 deletions R/lencode_bayes.R
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,13 @@
#' Modeling," arXiv:1611.09477
#'
#' "Hierarchical Partial Pooling for Repeated Binary Trials"
#' \url{https://tinyurl.com/stan-pooling}
#' \url{https://CRAN.R-project.org/package=rstanarm/vignettes/pooling.html}
#'
#' "Prior Distributions for `rstanarm` Models"
#' \url{https://tinyurl.com/stan-priors}
#' \url{http://mc-stan.org/rstanarm/reference/priors.html}
#'
#' "Estimating Generalized (Non-)Linear Models with Group-Specific Terms with
#' `rstanarm`" \url{https://tinyurl.com/stan-glm-grouped}
#' `rstanarm`" \url{http://mc-stan.org/rstanarm/articles/glmer.html}
#'
#' @examplesIf rlang::is_installed("modeldata")
#' library(recipes)
Expand Down
108 changes: 55 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,55 +25,57 @@ dependencies, [`rstanarm`](https://CRAN.r-project.org/package=rstanarm),

Some steps handle categorical predictors:

- `step_lencode_glm()`, `step_lencode_bayes()`, and
`step_lencode_mixed()` estimate the effect of each of the factor
levels on the outcome and these estimates are used as the new
encoding. The estimates are estimated by a generalized linear model.
This step can be executed without pooling (via `glm`) or with partial
pooling (`stan_glm` or `lmer`). Currently implemented for numeric and
two-class outcomes.

- `step_embed()` uses `keras::layer_embedding` to translate the original
*C* factor levels into a set of *D* new variables (\< *C*). The model
fitting routine optimizes which factor levels are mapped to each of
the new variables as well as the corresponding regression coefficients
(i.e., neural network weights) that will be used as the new encodings.

- `step_woe()` creates new variables based on weight of evidence
encodings.

- `step_feature_hash()` can create indicator variables using feature
hashing.
- `step_lencode_glm()`, `step_lencode_bayes()`, and
`step_lencode_mixed()` estimate the effect of each of the factor
levels on the outcome and these estimates are used as the new
encoding. The estimates are estimated by a generalized linear model.
This step can be executed without pooling (via `glm`) or with
partial pooling (`stan_glm` or `lmer`). Currently implemented for
numeric and two-class outcomes.

- `step_embed()` uses `keras::layer_embedding` to translate the
original *C* factor levels into a set of *D* new variables (\< *C*).
The model fitting routine optimizes which factor levels are mapped
to each of the new variables as well as the corresponding regression
coefficients (i.e., neural network weights) that will be used as the
new encodings.

- `step_woe()` creates new variables based on weight of evidence
encodings.

- `step_feature_hash()` can create indicator variables using feature
hashing.

For numeric predictors:

- `step_umap()` uses a nonlinear transformation similar to t-SNE but can
be used to project the transformation on new data. Both supervised and
unsupervised methods can be used.
- `step_umap()` uses a nonlinear transformation similar to t-SNE but
can be used to project the transformation on new data. Both
supervised and unsupervised methods can be used.

- `step_discretize_xgb()` and `step_discretize_cart()` can make binned
versions of numeric predictors using supervised tree-based models.
- `step_discretize_xgb()` and `step_discretize_cart()` can make binned
versions of numeric predictors using supervised tree-based models.

- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature
extraction with sparsity of the component loadings.
- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature
extraction with sparsity of the component loadings.

Some references for these methods are:

- Francois C and Allaire JJ (2018) [*Deep Learning with
R*](https://www.manning.com/books/deep-learning-with-r), Manning
- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical
Variables](https://arxiv.org/abs/1604.06737)”
- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality
categorical attributes in classification and prediction
problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),”
ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for
Predictive Modeling](https://arxiv.org/abs/1611.09477)”
- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation and
Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426)
- Good, I. J. (1985), “[Weight of evidence: A brief
survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”,
Bayesian Statistics, 2, pp.249-270.
- Francois C and Allaire JJ (2018) [*Deep Learning with
R*](https://www.manning.com/books/deep-learning-with-r), Manning
- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical
Variables](https://arxiv.org/abs/1604.06737)”
- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality
categorical attributes in classification and prediction
problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),”
ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for
Predictive Modeling](https://arxiv.org/abs/1611.09477)”
- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation
and Projection for Dimension
Reduction](https://arxiv.org/abs/1802.03426)
- Good, I. J. (1985), “[Weight of evidence: A brief
survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”,
Bayesian Statistics, 2, pp.249-270.

## Getting Started

Expand Down Expand Up @@ -113,18 +115,18 @@ This project is released with a [Contributor Code of
Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and
machine learning, please [post on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- For questions and discussions about tidymodels packages, modeling,
and machine learning, please [post on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/embed/issues).
- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/embed/issues).

- Either way, learn how to create and share a
[reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
(a minimal, reproducible example), to clearly communicate about your
code.
- Either way, learn how to create and share a
[reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
(a minimal, reproducible example), to clearly communicate about your
code.

- Check out further details on [contributing guidelines for tidymodels
packages](https://www.tidymodels.org/contribute/) and [how to get
help](https://www.tidymodels.org/help/).
- Check out further details on [contributing guidelines for tidymodels
packages](https://www.tidymodels.org/contribute/) and [how to get
help](https://www.tidymodels.org/help/).
6 changes: 3 additions & 3 deletions man/step_lencode_bayes.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

132 changes: 66 additions & 66 deletions revdep/README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,101 @@
# Platform

|field |value |
|:--------|:------------------------------------------------------------|
|version |R version 4.3.1 (2023-06-16) |
|os |macOS Ventura 13.6 |
|system |aarch64, darwin20 |
|ui |X11 |
|language |(EN) |
|collate |en_US.UTF-8 |
|ctype |en_US.UTF-8 |
|tz |America/Los_Angeles |
|date |2023-10-17 |
|pandoc |3.1.3 @ /Users/emilhvitfeldt/miniforge3/bin/ (via rmarkdown) |
|field |value |
|:--------|:---------------------------------------------|
|version |R version 4.3.2 (2023-10-31) |
|os |macOS Sonoma 14.3.1 |
|system |aarch64, darwin20 |
|ui |X11 |
|language |(EN) |
|collate |en_US.UTF-8 |
|ctype |en_US.UTF-8 |
|tz |America/Los_Angeles |
|date |2024-03-19 |
|pandoc |2.17.1.1 @ /opt/homebrew/bin/ (via rmarkdown) |

# Dependencies

|package |old |new |Δ |
|:------------|:----------|:----------|:--|
|embed |1.1.2 |1.1.2.9000 |* |
|backports |1.4.1 |1.4.1 | |
|base64enc |0.1-3 |0.1-3 | |
|BH |1.81.0-1 |1.81.0-1 | |
|cli |3.6.1 |3.6.1 | |
|embed |1.1.3 |1.1.3.9000 |* |
|backports |1.4.1 |NA |* |
|base64enc |0.1-3 |NA |* |
|BH |1.84.0-0 |1.84.0-0 | |
|cli |3.6.2 |3.6.2 | |
|clock |0.7.0 |0.7.0 | |
|config |0.3.2 |0.3.2 | |
|cpp11 |0.4.6 |0.4.6 | |
|data.table |1.14.8 |1.14.8 | |
|config |0.3.2 |NA |* |
|cpp11 |0.4.7 |0.4.7 | |
|data.table |1.15.2 |1.15.2 | |
|diagram |1.6.5 |1.6.5 | |
|digest |0.6.33 |0.6.33 | |
|dplyr |1.1.3 |1.1.3 | |
|dqrng |0.3.1 |0.3.1 | |
|digest |0.6.35 |0.6.35 | |
|dplyr |1.1.4 |1.1.4 | |
|dqrng |0.3.2 |0.3.2 | |
|ellipsis |0.3.2 |0.3.2 | |
|fansi |1.0.5 |1.0.5 | |
|FNN |1.1.3.2 |1.1.3.2 | |
|fansi |1.0.6 |1.0.6 | |
|FNN |1.1.4 |1.1.4 | |
|furrr |0.3.1 |0.3.1 | |
|future |1.33.0 |1.33.0 | |
|future.apply |1.11.0 |1.11.0 | |
|future |1.33.1 |1.33.1 | |
|future.apply |1.11.1 |1.11.1 | |
|generics |0.1.3 |0.1.3 | |
|globals |0.16.2 |0.16.2 | |
|glue |1.6.2 |1.6.2 | |
|globals |0.16.3 |0.16.3 | |
|glue |1.7.0 |1.7.0 | |
|gower |1.0.1 |1.0.1 | |
|hardhat |1.3.0 |1.3.0 | |
|here |1.0.1 |1.0.1 | |
|hardhat |1.3.1 |1.3.1 | |
|here |1.0.1 |NA |* |
|ipred |0.9-14 |0.9-14 | |
|irlba |2.3.5.1 |2.3.5.1 | |
|jsonlite |1.8.7 |1.8.7 | |
|keras |2.13.0 |2.13.0 | |
|lava |1.7.2.1 |1.7.2.1 | |
|lifecycle |1.0.3 |1.0.3 | |
|listenv |0.9.0 |0.9.0 | |
|jsonlite |1.8.8 |NA |* |
|keras |2.13.0 |NA |* |
|lava |1.8.0 |1.8.0 | |
|lifecycle |1.0.4 |1.0.4 | |
|listenv |0.9.1 |0.9.1 | |
|lubridate |1.9.3 |1.9.3 | |
|magrittr |2.0.3 |2.0.3 | |
|numDeriv |2016.8-1.1 |2016.8-1.1 | |
|parallelly |1.36.0 |1.36.0 | |
|parallelly |1.37.1 |1.37.1 | |
|pillar |1.9.0 |1.9.0 | |
|pkgconfig |2.0.3 |2.0.3 | |
|png |0.1-8 |0.1-8 | |
|processx |3.8.2 |3.8.2 | |
|png |0.1-8 |NA |* |
|processx |3.8.4 |NA |* |
|prodlim |2023.08.28 |2023.08.28 | |
|progressr |0.14.0 |0.14.0 | |
|ps |1.7.5 |1.7.5 | |
|ps |1.7.6 |NA |* |
|purrr |1.0.2 |1.0.2 | |
|R6 |2.5.1 |2.5.1 | |
|rappdirs |0.3.3 |0.3.3 | |
|Rcpp |1.0.11 |1.0.11 | |
|RcppAnnoy |0.0.21 |0.0.21 | |
|rappdirs |0.3.3 |NA |* |
|Rcpp |1.0.12 |1.0.12 | |
|RcppAnnoy |0.0.22 |0.0.22 | |
|RcppProgress |0.4.2 |0.4.2 | |
|RcppTOML |0.2.2 |0.2.2 | |
|recipes |1.0.8 |1.0.8 | |
|reticulate |1.34.0 |1.34.0 | |
|rlang |1.1.1 |1.1.1 | |
|rprojroot |2.0.3 |2.0.3 | |
|RcppTOML |0.2.2 |NA |* |
|recipes |1.0.10 |1.0.10 | |
|reticulate |1.35.0 |NA |* |
|rlang |1.1.3 |1.1.3 | |
|rprojroot |2.0.4 |NA |* |
|rsample |1.2.0 |1.2.0 | |
|rstudioapi |0.15.0 |0.15.0 | |
|shape |1.4.6 |1.4.6 | |
|rstudioapi |0.15.0 |NA |* |
|shape |1.4.6.1 |1.4.6.1 | |
|sitmo |2.0.2 |2.0.2 | |
|slider |0.3.1 |0.3.1 | |
|SQUAREM |2021.1 |2021.1 | |
|stringi |1.7.12 |1.7.12 | |
|stringr |1.5.0 |1.5.0 | |
|tensorflow |2.14.0 |2.14.0 | |
|tfautograph |0.3.2 |0.3.2 | |
|tfruns |1.5.1 |1.5.1 | |
|stringi |1.8.3 |1.8.3 | |
|stringr |1.5.1 |1.5.1 | |
|tensorflow |2.15.0 |NA |* |
|tfautograph |0.3.2 |NA |* |
|tfruns |1.5.2 |NA |* |
|tibble |3.2.1 |3.2.1 | |
|tidyr |1.3.0 |1.3.0 | |
|tidyselect |1.2.0 |1.2.0 | |
|timechange |0.2.0 |0.2.0 | |
|timeDate |4022.108 |4022.108 | |
|tidyr |1.3.1 |1.3.1 | |
|tidyselect |1.2.1 |1.2.1 | |
|timechange |0.3.0 |0.3.0 | |
|timeDate |4032.109 |4032.109 | |
|tzdb |0.4.0 |0.4.0 | |
|utf8 |1.2.3 |1.2.3 | |
|utf8 |1.2.4 |1.2.4 | |
|uwot |0.1.16 |0.1.16 | |
|vctrs |0.6.4 |0.6.4 | |
|warp |0.2.0 |0.2.0 | |
|whisker |0.4.1 |0.4.1 | |
|withr |2.5.1 |2.5.1 | |
|yaml |2.3.7 |2.3.7 | |
|zeallot |0.1.0 |0.1.0 | |
|vctrs |0.6.5 |0.6.5 | |
|warp |0.2.1 |0.2.1 | |
|whisker |0.4.1 |NA |* |
|withr |3.0.0 |3.0.0 | |
|yaml |2.3.8 |NA |* |
|zeallot |0.1.0 |NA |* |

# Revdeps

Loading