Releases: tidymodels/recipes
recipes 1.0.2
-
A new set of basis functions were added:
step_spline_b()
,step_spline_convex()
,step_spline_monotone()
,step_spline_natural()
,step_spline_nonnegative()
, and
step_poly_bernstein()
. -
step_date()
,step_dummy()
,step_dummy_extract()
,step_holiday()
,step_ordinalscore()
, andstep_regex()
now returns integer results when appropriate. (#766) -
The default for the
strict
argument instep_integer()
has been changed fromFALSE
toTRUE
. The function will thus return integers, rather than whole-number numerics, by default. (#766) -
The default for the
value
argument instep_intercept()
has been changed from1
to1L
. (#766)
recipes 1.0.1
- Fixed bug where
step_holiday()
didn't work if it isn't have any missing values. (#1019)
recipes 1.0.0
Improvements and Other Changes
-
Added support for case weights in the following steps
step_center()
step_classdist()
step_corr()
step_dummy_extract()
step_filter_missing()
step_impute_linear()
step_impute_mean()
step_impute_median()
step_impute_mode()
step_normalize()
step_nzv()
step_other()
step_percentile()
step_pca()
step_sample()
step_scale()
-
A number of developer focused functions to deal with case weights are added:
are_weights_used()
,get_case_weights()
,averages()
,medians()
,variances()
,correlations()
,covariances()
, andpca_wts()
-
recipes now checks that all columns in the
data
supplied torecipe()
are also present in thenew_data
supplied tobake()
. An exception is made for columns with roles of either"outcome"
or"case_weights"
, which are typically not required atbake()
time. The newupdate_role_requirements()
function can be used to adjust whether or not columns of a particular role are required atbake()
time if you need to opt out of this check (#1011). -
The
summary()
method for recipe objects now contains an extra column to indicate which columns are required whenbake()
is used.
New Steps
step_time()
has been added that extracts time features such as hour, minute, or second. (#968)
Bug Fixes
-
Fixed bug in which functions that
step_hyperbolic()
uses (#932). -
step_dummy_multi_choice()
now respects factor-levels of the selected variables when creating dummies. (#916) -
step_dummy()
no works correctly with recipes trained on version 0.1.17 or earlier. (#921) -
Fixed a bug where setting
fresh = TRUE
inprep()
wouldn't result in re-prepping the recipe. (#492) -
Bug was fixed in
step_holiday()
which used to error when it was applied to variable with missing values. (#743) -
A bug was fixed in
step_normalize()
which used to error if 1 variable was selected. (#963)
Improvements and Other Changes
-
Finally removed
step_upsample()
andstep_downsample()
in recipes as they are now available in the themis package. -
discretize()
andstep_discretize()
now can return factor levels similar tocut()
. (#674) -
step_naomit()
now actually had their defaults forskip
changed toTRUE
as was stated in release 0.1.13. (934) -
step_dummy()
has been made more robust to non-standard column names. (#879) -
step_pls()
now allows you use use multiple outcomes if they are numeric. (#651) -
step_normalize()
andstep_scale()
ignore columns with zero variance, generate a warning and suggest to usestep_zv()
(#920). -
printing for
step_impute_knn()
now show variables that were imputed instead of variables used for imputing. (#837) -
step_discretize()
anddiscretize()
will automatically remove missing values ifkeep_na = TRUE
, removing the need to specifykeep_na = TRUE
andna.rm = TRUE
. (#982) -
prep()
andbake()
checks and errors if output ofbake.bake_*()
isn't a tibble. -
step_date()
now has a locale argument that can be used to control how themonth
anddow
features are returned. (#1000)
recipes 0.2.0
New Steps
-
step_nnmf_sparse()
uses a different implementation of non-negative matrix factorization that is much faster and enables regularized estimation. (#790) -
step_dummy_extract()
creates multiple variables from a character variable by extracting elements using regular expressions and counting those elements. -
step_filter_missing()
can filter columns based on proportion of missingness (#270). -
step_percentile()
replaces the value of a variable with its percentile from the training set. (#765)
Improvements and Other Changes
-
All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#603, #531). For example, if a previous step removed all of the columns need for a later step, the recipe does not fail when it is estimated (with the exception of
step_mutate()
). The documentation in?selections
has been updated with advice for writing selectors when filtering steps are used. (#813) -
Fixed bug in
step_harmonic()
printing and changed defaults torole = "predictor"
andkeep_original_cols = FALSE
(#822). -
Improved the efficiency of computations for the Box-Cox transformation (#820).
-
When a feature extraction step (e.g.,
step_pca()
,step_ica()
, etc.) has zero components specified, thetidy()
method now lists the selected columns in theterms
column. -
Deprecation has started for
step_nnmf()
in favor ofstep_nnmf_sparse()
. (#790) -
Steps now have a dedicated subsection detailing what happens when
tidy()
is applied. (#876) -
step_ica()
now runsfastICA()
using a specific set of random numbers so that initialization is reproducible. -
tidy.recipe()
now returns a zero row tibble instead of an error when applied to a empty recipe. (#867) -
step_zv()
now has agroup
argument. The same filter is applied but looks for zero-variance within 1 or more columns that define groups. (#711) -
detect_step()
is no longer restricted to steps created in recipes (#869). -
New
extract_parameter_set_dials()
andextract_parameter_dials()
methods to extract parameter sets and single parameters fromrecipe
objects. -
step_other()
now allow for settingthreshold = 0
which will result in no othering. (#904)
Breaking Changes
-
step_ica()
now indirectly uses thefastICA
package since that package has increased their R version requirement. Recipe objects from previous versions will error when applied to new data. (#823) -
step_kpca*()
now directly use thekernlab
package. Recipe objects from previous versions will error when applied to new data.
Developer
- The print methods have been internally changes to use
print_step()
instead ofprinter()
. This is done for a smoother transition to usecli
in the next version. (#871)
recipes 0.1.16
New Steps
-
Added a new step called
step_indicate_na()
, which will create and append additional binary columns to the data set to indicate which observations are missing (#623). -
Added new
step_select()
(#199).
Bug Fixes
-
The
threshold
argument ofstep_pca()
is nowtunable()
(#534). -
Integer variables used in
step_profile()
are now kept as integers (and not doubles). -
Preserve multiple roles in
last_term_info
sobake()
can correctly respond tohas_roles
. (#632) -
Fixed behavior of the retain flag in
prep()
(#652). -
The
tidy()
methods forstep_nnmf()
was rewritten since it was not great (#665), andstep_nnmf()
now no longer fully loads underlying packages (#685).
Improvements and Other Changes
-
Two new selectors that combine role and data type were added:
all_numeric_predictors()
andall_nominal_predictors()
. (#620) -
Changed the names of all imputation steps, for example, from
step_knnimpute()
orstep_medianimpute()
(old) tostep_impute_knn()
orstep_impute_median()
(new) (#614). -
Added
keep_original_cols
argument to several steps: -
Added
allow_rename
argument toeval_select_recipes()
(#646). -
Performance improvements for
step_bs()
andstep_ns()
. Theprep()
step no longer evaluates the basis functions on the training set and thebake()
steps only evaluates the basis functions once for each unique input value (#574) -
The
neighbors
parameter's default range forstep_isomap()
was changed to be 20-80. -
The deprecation for
step_upsample()
andstep_downsample()
has been escalated from a soft deprecation to a regular deprecation; these functions are available in the themis package. -
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
recipes 0.1.15
-
The full tidyselect DSL is now allowed inside recipes
step_*()
functions. This includes the operators&
,|
,-
and!
and the newwhere()
function. Additionally, the restriction preventing user defined selectors from being used has been lifted (#572). -
If steps that drop/add variables are skipped when baking the test set, the resulting column ordering of the baked test set will now be relative to the original recipe specification rather than relative to the baked training set. This is often more intuitive.
-
More infrastructure work to make parallel processing on Windows less buggy with PSOCK clusters
-
fully_trained()
now returnsFALSE
when an unprepped recipe is used.
recipes 0.1.14
-
prep()
gained an option to print a summary of which columns were added and/or removed during execution. -
To reduce confusion between
bake()
andjuice()
, the latter is superseded in favor of usingbake(object, new_data = NULL)
. Thenew_data
argument now has no default, so aNULL
value must be explicitly used in order to emulate the results ofjuice()
.juice()
will remain in the package (and used internally) but most communication and training will usebake(object, new_data = NULL)
. (#543) -
Tim Zhou added a step to use linear models for imputation (#555)
recipes 0.1.10
Breaking Changes
- renamed
yj_trans()
toyj_transform()
to avoid conflicts.
Other Changes
-
Added flexible naming options for new columns created by
step_depth()
andstep_classdist()
(#262). -
Small changes for base R's
stringsAsFactors
change.
0.1.7 CRAN release
0.1.7-CRAN doc update for new version
0.1.4 CRAN release
Merge pull request #259 from tidymodels/cran-dimRed-ex-fix Fixes for CRAN notice about failing examples