diff --git a/NEWS.md b/NEWS.md index 2e678b33..e088e668 100644 --- a/NEWS.md +++ b/NEWS.md @@ -4,6 +4,8 @@ * Started moving error messages to cli (#499, #502). +* Improved documentation for `initial_split()` and friends (@laurabrianna, #519). + ## Bug fixes * `vfold_cv()` now utilizes the `breaks` argument correctly for repeated cross-validation (@ZWael, #471). diff --git a/R/initial_split.R b/R/initial_split.R index 66157c76..c754e424 100644 --- a/R/initial_split.R +++ b/R/initial_split.R @@ -1,18 +1,20 @@ #' Simple Training/Test Set Splitting #' -#' `initial_split` creates a single binary split of the data into a training -#' set and testing set. `initial_time_split` does the same, but takes the +#' `initial_split()` creates a single binary split of the data into a training +#' set and testing set. `initial_time_split()` does the same, but takes the #' _first_ `prop` samples for training, instead of a random selection. -#' `group_initial_split` creates splits of the data based +#' `group_initial_split()` creates splits of the data based #' on some grouping variable, so that all data in a "group" is assigned to -#' the same split. -#' `training` and `testing` are used to extract the resulting data. +#' the same split. +#' +#' @details `training()` and `testing()` are used to extract the resulting data. +#' #' @template strata_details #' @inheritParams vfold_cv #' @inheritParams make_strata #' @param prop The proportion of data to be retained for modeling/analysis. #' @export -#' @return An `rsplit` object that can be used with the `training` and `testing` +#' @return An `rsplit` object that can be used with the `training()` and `testing()` #' functions to extract the data in each split. #' @examplesIf rlang::is_installed("modeldata") #' set.seed(1353) @@ -176,12 +178,12 @@ group_initial_split <- function(data, group, prop = 3 / 4, ..., strata = NULL, p attrib <- .get_split_args(res, allow_strata_false = TRUE) res <- res$splits[[1]] - + attrib$times <- NULL for (i in names(attrib)) { attr(res, i) <- attrib[[i]] } class(res) <- c("group_initial_split", "initial_split", class(res)) - + res } diff --git a/R/initial_validation_split.R b/R/initial_validation_split.R index e3ce84d6..0aec03fd 100644 --- a/R/initial_validation_split.R +++ b/R/initial_validation_split.R @@ -8,9 +8,10 @@ #' `group_initial_validation_split()` creates similar random splits of the data #' based on some grouping variable, so that all data in a "group" are assigned #' to the same partition. -#' `training()`, `validation()`, and `testing()` can be used to extract the +#' +#' @details [training()], [validation()], and [testing()] can be used to extract the #' resulting data sets. -#' Use [`validation_set()`] to create an `rset` object for use with functions from +#' Use [validation_set()] to create an `rset` object for use with functions from #' the tune package such as `tune::tune_grid()`. #' #' @template strata_details diff --git a/R/validation_set.R b/R/validation_set.R index 14d48eb3..db775e02 100644 --- a/R/validation_set.R +++ b/R/validation_set.R @@ -1,5 +1,7 @@ #' Create a Validation Split for Tuning #' +#' `validation_set()` creates a the validation split for model tuning. +#' #' @param split An object of class `initial_validation_split`, such as resulting #' from [initial_validation_split()] or [group_initial_validation_split()]. #' @param x An `rsplit` object produced by `validation_set()`. diff --git a/man/initial_split.Rd b/man/initial_split.Rd index 740bf66f..07fc96b6 100644 --- a/man/initial_split.Rd +++ b/man/initial_split.Rd @@ -61,19 +61,20 @@ grouping observations with the same value to either the analysis or assessment set within a fold.} } \value{ -An \code{rsplit} object that can be used with the \code{training} and \code{testing} +An \code{rsplit} object that can be used with the \code{training()} and \code{testing()} functions to extract the data in each split. } \description{ -\code{initial_split} creates a single binary split of the data into a training -set and testing set. \code{initial_time_split} does the same, but takes the +\code{initial_split()} creates a single binary split of the data into a training +set and testing set. \code{initial_time_split()} does the same, but takes the \emph{first} \code{prop} samples for training, instead of a random selection. -\code{group_initial_split} creates splits of the data based +\code{group_initial_split()} creates splits of the data based on some grouping variable, so that all data in a "group" is assigned to the same split. -\code{training} and \code{testing} are used to extract the resulting data. } \details{ +\code{training()} and \code{testing()} are used to extract the resulting data. + With a \code{strata} argument, the random sampling is conducted \emph{within the stratification variable}. This can help ensure that the resamples have equivalent proportions as the original data set. For diff --git a/man/initial_validation_split.Rd b/man/initial_validation_split.Rd index 734a2a97..9017e36e 100644 --- a/man/initial_validation_split.Rd +++ b/man/initial_validation_split.Rd @@ -81,12 +81,13 @@ data set, with the first observations being put into the training set. \code{group_initial_validation_split()} creates similar random splits of the data based on some grouping variable, so that all data in a "group" are assigned to the same partition. -\code{training()}, \code{validation()}, and \code{testing()} can be used to extract the +} +\details{ +\code{\link[=training]{training()}}, \code{\link[=validation]{validation()}}, and \code{\link[=testing]{testing()}} can be used to extract the resulting data sets. Use \code{\link[=validation_set]{validation_set()}} to create an \code{rset} object for use with functions from the tune package such as \code{tune::tune_grid()}. -} -\details{ + With a \code{strata} argument, the random sampling is conducted \emph{within the stratification variable}. This can help ensure that the resamples have equivalent proportions as the original data set. For diff --git a/man/validation_set.Rd b/man/validation_set.Rd index de3d91bd..98de1f31 100644 --- a/man/validation_set.Rd +++ b/man/validation_set.Rd @@ -35,7 +35,7 @@ An tibble with classes \code{validation_set}, \code{rset}, \code{tbl_df}, \code{ column called \code{id} that has a character string with the resample identifier. } \description{ -Create a Validation Split for Tuning +\code{validation_set()} creates a the validation split for model tuning. } \examples{ set.seed(1353)