Skip to content

Releases: tidymodels/recipes

recipes 1.0.2

16 Oct 19:40
79baabe
Compare
Choose a tag to compare
  • A new set of basis functions were added: step_spline_b(), step_spline_convex(), step_spline_monotone(), step_spline_natural(), step_spline_nonnegative(), and
    step_poly_bernstein().

  • step_date(), step_dummy(), step_dummy_extract(), step_holiday(), step_ordinalscore(), and step_regex() now returns integer results when appropriate. (#766)

  • The default for the strict argument in step_integer() has been changed from FALSE to TRUE. The function will thus return integers, rather than whole-number numerics, by default. (#766)

  • The default for the value argument in step_intercept() has been changed from 1 to 1L. (#766)

recipes 1.0.1

11 Jul 21:10
a658e1e
Compare
Choose a tag to compare
  • Fixed bug where step_holiday() didn't work if it isn't have any missing values. (#1019)

recipes 1.0.0

01 Jul 17:03
41bd8bf
Compare
Choose a tag to compare

Improvements and Other Changes

  • Added support for case weights in the following steps

    • step_center()
    • step_classdist()
    • step_corr()
    • step_dummy_extract()
    • step_filter_missing()
    • step_impute_linear()
    • step_impute_mean()
    • step_impute_median()
    • step_impute_mode()
    • step_normalize()
    • step_nzv()
    • step_other()
    • step_percentile()
    • step_pca()
    • step_sample()
    • step_scale()
  • A number of developer focused functions to deal with case weights are added: are_weights_used(), get_case_weights(), averages(), medians(), variances(), correlations(), covariances(), and pca_wts()

  • recipes now checks that all columns in the data supplied to recipe() are also present in the new_data supplied to bake(). An exception is made for columns with roles of either "outcome" or "case_weights", which are typically not required at bake() time. The new update_role_requirements() function can be used to adjust whether or not columns of a particular role are required at bake() time if you need to opt out of this check (#1011).

  • The summary() method for recipe objects now contains an extra column to indicate which columns are required when bake() is used.

New Steps

  • step_time() has been added that extracts time features such as hour, minute, or second. (#968)

Bug Fixes

  • Fixed bug in which functions that step_hyperbolic() uses (#932).

  • step_dummy_multi_choice() now respects factor-levels of the selected variables when creating dummies. (#916)

  • step_dummy() no works correctly with recipes trained on version 0.1.17 or earlier. (#921)

  • Fixed a bug where setting fresh = TRUE in prep() wouldn't result in re-prepping the recipe. (#492)

  • Bug was fixed in step_holiday() which used to error when it was applied to variable with missing values. (#743)

  • A bug was fixed in step_normalize() which used to error if 1 variable was selected. (#963)

Improvements and Other Changes

  • Finally removed step_upsample() and step_downsample() in recipes as they are now available in the themis package.

  • discretize() and step_discretize() now can return factor levels similar to cut(). (#674)

  • step_naomit() now actually had their defaults for skip changed to TRUE as was stated in release 0.1.13. (934)

  • step_dummy() has been made more robust to non-standard column names. (#879)

  • step_pls() now allows you use use multiple outcomes if they are numeric. (#651)

  • step_normalize() and step_scale() ignore columns with zero variance, generate a warning and suggest to use step_zv() (#920).

  • printing for step_impute_knn() now show variables that were imputed instead of variables used for imputing. (#837)

  • step_discretize() and discretize() will automatically remove missing values if keep_na = TRUE, removing the need to specify keep_na = TRUE and na.rm = TRUE. (#982)

  • prep() and bake() checks and errors if output of bake.bake_*() isn't a tibble.

  • step_date() now has a locale argument that can be used to control how the month and dow features are returned. (#1000)

recipes 0.2.0

19 Feb 14:18
610da60
Compare
Choose a tag to compare

New Steps

  • step_nnmf_sparse() uses a different implementation of non-negative matrix factorization that is much faster and enables regularized estimation. (#790)

  • step_dummy_extract() creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements.

  • step_filter_missing() can filter columns based on proportion of missingness (#270).

  • step_percentile() replaces the value of a variable with its percentile from the training set. (#765)

Improvements and Other Changes

  • All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#603, #531). For example, if a previous step removed all of the columns need for a later step, the recipe does not fail when it is estimated (with the exception of step_mutate()). The documentation in ?selections has been updated with advice for writing selectors when filtering steps are used. (#813)

  • Fixed bug in step_harmonic() printing and changed defaults to role = "predictor" and keep_original_cols = FALSE (#822).

  • Improved the efficiency of computations for the Box-Cox transformation (#820).

  • When a feature extraction step (e.g., step_pca(), step_ica(), etc.) has zero components specified, the tidy() method now lists the selected columns in the terms column.

  • Deprecation has started for step_nnmf() in favor of step_nnmf_sparse(). (#790)

  • Steps now have a dedicated subsection detailing what happens when tidy() is applied. (#876)

  • step_ica() now runs fastICA() using a specific set of random numbers so that initialization is reproducible.

  • tidy.recipe() now returns a zero row tibble instead of an error when applied to a empty recipe. (#867)

  • step_zv() now has a group argument. The same filter is applied but looks for zero-variance within 1 or more columns that define groups. (#711)

  • detect_step() is no longer restricted to steps created in recipes (#869).

  • New extract_parameter_set_dials() and extract_parameter_dials() methods to extract parameter sets and single parameters from recipe objects.

  • step_other() now allow for setting threshold = 0 which will result in no othering. (#904)

Breaking Changes

  • step_ica() now indirectly uses the fastICA package since that package has increased their R version requirement. Recipe objects from previous versions will error when applied to new data. (#823)

  • step_kpca*() now directly use the kernlab package. Recipe objects from previous versions will error when applied to new data.

Developer

  • The print methods have been internally changes to use print_step() instead of printer(). This is done for a smoother transition to use cli in the next version. (#871)

recipes 0.1.16

16 Apr 15:30
Compare
Choose a tag to compare

New Steps

  • Added a new step called step_indicate_na(), which will create and append additional binary columns to the data set to indicate which observations are missing (#623).

  • Added new step_select() (#199).

Bug Fixes

  • The threshold argument of step_pca() is now tunable() (#534).

  • Integer variables used in step_profile() are now kept as integers (and not doubles).

  • Preserve multiple roles in last_term_info so bake() can correctly respond to has_roles. (#632)

  • Fixed behavior of the retain flag in prep() (#652).

  • The tidy() methods for step_nnmf() was rewritten since it was not great (#665), and step_nnmf() now no longer fully loads underlying packages (#685).

Improvements and Other Changes

  • Two new selectors that combine role and data type were added: all_numeric_predictors() and all_nominal_predictors(). (#620)

  • Changed the names of all imputation steps, for example, from step_knnimpute() or step_medianimpute() (old) to step_impute_knn() or step_impute_median() (new) (#614).

  • Added keep_original_cols argument to several steps:

    • step_pca(), step_ica(), step_nnmf(), step_kpca_rbf(), step_kpca_poly(), step_pls(), step_isomap() which all default to FALSE (#635).
    • step_ratio(), step_holiday(), step_date() which all default to TRUE to maintain original behavior, as well as step_dummy() which defaults to FALSE (#645).
  • Added allow_rename argument to eval_select_recipes() (#646).

  • Performance improvements for step_bs() and step_ns(). The prep() step no longer evaluates the basis functions on the training set and the bake() steps only evaluates the basis functions once for each unique input value (#574)

  • The neighbors parameter's default range for step_isomap() was changed to be 20-80.

  • The deprecation for step_upsample() and step_downsample() has been escalated from a soft deprecation to a regular deprecation; these functions are available in the themis package.

  • Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.

recipes 0.1.15

11 Nov 19:32
Compare
Choose a tag to compare
  • The full tidyselect DSL is now allowed inside recipes step_*() functions. This includes the operators &, |, - and ! and the new where() function. Additionally, the restriction preventing user defined selectors from being used has been lifted (#572).

  • If steps that drop/add variables are skipped when baking the test set, the resulting column ordering of the baked test set will now be relative to the original recipe specification rather than relative to the baked training set. This is often more intuitive.

  • More infrastructure work to make parallel processing on Windows less buggy with PSOCK clusters

  • fully_trained() now returns FALSE when an unprepped recipe is used.

recipes 0.1.14

18 Oct 18:14
Compare
Choose a tag to compare
  • prep() gained an option to print a summary of which columns were added and/or removed during execution.

  • To reduce confusion between bake() and juice(), the latter is superseded in favor of using bake(object, new_data = NULL). The new_data argument now has no default, so a NULL value must be explicitly used in order to emulate the results of juice(). juice() will remain in the package (and used internally) but most communication and training will use bake(object, new_data = NULL). (#543)

  • Tim Zhou added a step to use linear models for imputation (#555)

recipes 0.1.10

18 Mar 22:54
3551960
Compare
Choose a tag to compare

Breaking Changes

  • renamed yj_trans() to yj_transform() to avoid conflicts.

Other Changes

  • Added flexible naming options for new columns created by step_depth() and step_classdist() (#262).

  • Small changes for base R's stringsAsFactors change.

0.1.7 CRAN release

15 Sep 19:06
Compare
Choose a tag to compare
0.1.7-CRAN

doc update for new version

0.1.4 CRAN release

19 Nov 15:46
9003792
Compare
Choose a tag to compare
Merge pull request #259 from tidymodels/cran-dimRed-ex-fix

Fixes for CRAN notice about failing examples