Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Data Documentation Utility Helpers #1

Open
1 task
jimbrig opened this issue Jun 13, 2024 · 0 comments
Open
1 task

[Feature]: Data Documentation Utility Helpers #1

jimbrig opened this issue Jun 13, 2024 · 0 comments
Assignees
Labels
feature New enhancements and features.

Comments

@jimbrig
Copy link
Member

jimbrig commented Jun 13, 2024

Create helper functions to aid in the annoyance of creating data documentation roxygen2 comments, data dictionaries, codebooks, reports, visualizations, and metadata files.

  • document_dataset: given provided data_obj (i.e. data.frame or tibble), description, source, column_names and column_descriptions, output the roxygen2 skeleton for the dataset to an R/data.R file:
#' Document Datasets
#'
#' @description
#' Helper function to auto-generate the necessary `roxygen2` documentation for
#' datasets included/exported with an R package.
#'
#' @param data_obj The data object to be documented. Should be a `data.frame` or
#'   [tibble::tibble()], or any other object that can be coerced to a `data.frame`
#'   or `list`.
#' @param name Name of the dataset. If not provided, the name of the `data_obj`
#'   object will be used.
#' @param description Description of the dataset. If not provided, a
#'   placeholder will be used.
#' @param source The source of the dataset. If not provided, a
#'   placeholder will be used.
#' @param file Path to the file where the documentation will be written. If not provided, the
#'   documentation will be written to `R/data.R` by default. If you want to
#'   document individual datasets in separate files, you can provide a path to
#'   the file where the documentation will be written. The file will be created
#'   if it does not exist.
#' @param column_descriptions A named list of column descriptions for the dataset.
#'   The names should match the column names of the dataset. If not provided, a
#'   placeholder will be used.
#' @param ... Additional arguments not in use, yet.
#'
#' @return Invisibly returns the documentation string.
#'
#' @example examples/ex_document_datasets.R
#'
#' @export
document_data <- function(
  data_obj,
  name = deparse(substitute(data_obj)),
  description = "<Add a description here>",
  source = "<Add a source here>",
  file = "R/data.R",
  column_descriptions = NULL,
  ...
) {

  # validate data_obj and name
  if (!exists(deparse(substitute(data_obj)))) {
    rlang::abort("The dataset does not exist in the current environment.")
  }

  if (!is.data.frame(data_obj) && !inherits(data_obj, "tbl_df")) {
    rlang::abort("The provided object is not a data frame or tibble object.")
  }

  dataset_name <- deparse(substitute(x))
  data_description <- get_dataset_description(x, dataset_name)


  file_name <- paste0("./", dataset_name, ".R")
  cat(data_description, file = file_name)

  # Coerce the data to a data.frame
  dat <- as.data.frame(data_obj)

  # Check if the column descriptions are provided
  if (!is.null(column_descriptions)) {
    if (!is.list(column_descriptions)) {
      rlang::abort("Column descriptions must be a named list.")
    }
    if (length(column_descriptions) != ncol(dat)) {
      rlang::abort("Number of column descriptions must match the number of columns in the dataset.")
    }
  } else {
    column_descriptions <- rep("<Add a description here>", ncol(dat))
    names(column_descriptions) <- names(dat)
  }

  # title
  title <- paste0("#' @title ", name, "\n")
  description <- paste0("#' @description ", description, "\n")
  format <- paste0("#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n")

  # Create the documentation string
  doc <- paste0(
    "#' @title ", title, "\n",
    "#' @description ", description, "\n",
    "#' @usage data(", name, ")\n",
    "#' @format A data frame with ", nrow(dat), " rows and ", ncol(dat), " columns.\n",
    "#' @source <Add a source here>\n",
    "#' @export\n",
    " '", name, "'"
  )

  # Write the documentation to the file
  if (write_to_file) {
    cat(doc, file = file, append = TRUE)
  }

  # Return the documentation string
  invisible(doc)

}
@jimbrig jimbrig added the feature New enhancements and features. label Jun 13, 2024
@jimbrig jimbrig self-assigned this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New enhancements and features.
Projects
None yet
Development

No branches or pull requests

1 participant