Skip to content

Commit

Permalink
0-2-0 Release candidate (#911)
Browse files Browse the repository at this point in the history
* skip for tune registration issues

* rearrange

* version bump

* link doesn't exist yet
  • Loading branch information
topepo authored Feb 19, 2022
1 parent e18175c commit 610da60
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 17 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: recipes
Title: Preprocessing and Feature Engineering Steps for Modeling
Version: 0.1.17.9001
Version: 0.2.0
Authors@R: c(
person("Max", "Kuhn", , "[email protected]", role = c("aut", "cre")),
person("Hadley", "Wickham", , "[email protected]", role = "aut"),
Expand Down
33 changes: 18 additions & 15 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,29 @@
# recipes (development version)
# recipes 0.2.0

# New Steps

* `step_nnmf_sparse()` uses a different implementation of non-negative matrix factorization that is much faster and enables regularized estimation. (#790)

* `step_dummy_extract()` creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements.

* `step_filter_missing()` can filter columns based on proportion of missingness (#270).

* `step_percentile()` replaces the value of a variable with its percentile from the training set. (#765)

## Improvements and Other Changes

* All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#603, #531). For example, if a previous step removed all of the columns need for a later step, the recipe does not fail when it is estimated (with the exception of `step_mutate()`). The documentation in `?selections` has been updated with advice for writing selectors when filtering steps are used. (#813)

* Fixed bug in `step_harmonic()` printing and changed defaults to `role = "predictor"` and `keep_original_cols = FALSE` (#822).

* Added a new step called `step_filter_missing()`, which can filter columns based on proportion of missingness (#270).

* Improved the efficiency of computations for the Box-Cox transformation (#820).

* When a feature extraction step (e.g., `step_pca()`, `step_ica()`, etc.) has zero components specified, the `tidy()` method now lists the selected columns in the `terms` column.

* Added a new step called `step_nnmf_sparse()` which uses a different implementation of non-negative matrix factorization that is much faster and enables regularized estimation. (#790)

* Deprecation has started for `step_nnmf()` in favor of `step_nnmf_sparse()`. (#790)

* Steps now have a dedicated subsection detailing what happens when `tidy()` is applied. (#876)

* Added a new step called `step_dummy_extract()` which creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements.

## Breaking Changes

* `step_ica()` now indirectly uses the `fastICA` package since that package has increased their R version requirement. Recipe objects from previous versions will error when applied to new data. (#823)

* `step_kpca*()` now directly use the `kernlab` package. Recipe objects from previous versions will error when applied to new data.

* `step_ica()` now runs `fastICA()` using a specific set of random numbers so that initialization is reproducible.

* `tidy.recipe()` now returns a zero row tibble instead of an error when applied to a empty recipe. (#867)
Expand All @@ -34,12 +32,17 @@

* `detect_step()` is no longer restricted to steps created in recipes (#869).

* Added a new step called `step_percentile()`, that replaces the value of a variable with its percentile from the training set. (#765)

* New `extract_parameter_set_dials()` and `extract_parameter_dials()` methods to extract parameter sets and single parameters from `recipe` objects.

* `step_other()` now allow for setting `threshold = 0` which will result in no othering. (#904)

## Breaking Changes

* `step_ica()` now indirectly uses the `fastICA` package since that package has increased their R version requirement. Recipe objects from previous versions will error when applied to new data. (#823)

* `step_kpca*()` now directly use the `kernlab` package. Recipe objects from previous versions will error when applied to new data.


## Developer

* The print methods have been internally changes to use `print_step()` instead of `printer()`. This is done for a smoother transition to use `cli` in the next version. (#871)
Expand Down
10 changes: 10 additions & 0 deletions tests/testthat/helpers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

tune_check <- function() {
if (rlang::is_installed("tune")) {
res <- utils::packageVersion("tune") <= "0.1.6"
} else {
res <- TRUE
}
res
}

6 changes: 6 additions & 0 deletions tests/testthat/test-extract.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

test_that('extract parameter set from recipe with no steps', {
skip_if(tune_check())
bare_rec <- recipe(mpg ~ ., data = mtcars)

bare_info <- extract_parameter_set_dials(bare_rec)
Expand All @@ -8,6 +9,7 @@ test_that('extract parameter set from recipe with no steps', {
})

test_that('extract parameter set from recipe with no tunable parameters', {
skip_if(tune_check())
rm_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_rm(hp)
Expand All @@ -18,6 +20,7 @@ test_that('extract parameter set from recipe with no tunable parameters', {
})

test_that('extract parameter set from recipe with tunable parameters', {
skip_if(tune_check())
spline_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_impute_knn(all_numeric_predictors(), neighbors = hardhat::tune("imputation")) %>%
Expand Down Expand Up @@ -49,6 +52,7 @@ test_that('extract parameter set from recipe with tunable parameters', {
# -------------------------------------------------------------------------

test_that('extract single parameter from recipe with no steps', {
skip_if(tune_check())
bare_rec <- recipe(mpg ~ ., data = mtcars)

expect_error(
Expand All @@ -57,6 +61,7 @@ test_that('extract single parameter from recipe with no steps', {
})

test_that('extract single parameter from recipe with no tunable parameters', {
skip_if(tune_check())
rm_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_rm(hp)
Expand All @@ -67,6 +72,7 @@ test_that('extract single parameter from recipe with no tunable parameters', {
})

test_that('extract single parameter from recipe with tunable parameters', {
skip_if(tune_check())
spline_rec <-
recipe(mpg ~ ., data = mtcars) %>%
step_impute_knn(all_numeric_predictors(), neighbors = hardhat::tune("imputation")) %>%
Expand Down
2 changes: 1 addition & 1 deletion vignettes/Dummies.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ There are a bunch of steps related to going in-between factors and dummy variabl
* [`step_zv`](https://recipes.tidymodels.org/reference/step_zv.html) can remove dummy variables that never show a 1 in the column (i.e. is zero-variance).
* [`step_bin2factor`](https://recipes.tidymodels.org/reference/step_bin2factor.html) takes a binary indicator and makes a factor variable. This can be useful when using naive Bayes models.
* `step_embed`, `step_lencode_glm`, `step_lencode_bayes` and others in the [`embed`](https://github.com/tidymodels/embed) package can use one or more (non-binary) values to encode factor predictors into a numeric form.
* [`step_dummy_extract`](https://recipes.tidymodels.org/reference/step_dummy_extract.html) can create binary indicators from strings and is especially useful for multiple choice columns.
* `step_dummy_extract` can create binary indicators from strings and is especially useful for multiple choice columns.

[`step_dummy`](https://recipes.tidymodels.org/reference/step_dummy.html) also works with _ordered factors_. As seen above, the default encoding is to create a series of polynomial variables. There are also a few steps for ordered factors:

Expand Down

0 comments on commit 610da60

Please sign in to comment.