Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The episode on data can be expanded #80

Open
PabRod opened this issue Nov 7, 2022 · 1 comment
Open

The episode on data can be expanded #80

PabRod opened this issue Nov 7, 2022 · 1 comment
Labels

Comments

@PabRod
Copy link
Collaborator

PabRod commented Nov 7, 2022

Problem

The episode on data proved insufficient in our last teaching iteration.

Solution

@bvreede shared (privately) several suggestions on how to improve this. Incorporate them.

@PabRod PabRod added the ch-data label Nov 7, 2022
@bvreede
Copy link
Collaborator

bvreede commented Jan 27, 2023

From my notes:

Data

sample_names <- c("Luke", "Darth Vader", "Leia", "Chewbacca", "Han Solo", "R2D2")
usethis::use_data(sample_names)

Best done after documentation with roxygen, then document:

#' Example names
#'
#' An example data set containing six names from the Star Wars universe
#'
#' @format A vector of strings
#' @source Star Wars
"example_names"
  • No need to add the @export tag, in fact, it will break your package.
  • The object you create will be available to the user.
  • It is not in NAMESPACE, that is OK.

Save raw data in inst/extdata.
When using the data, this is how you refer to the file path:
system.file("extdata", "names.csv", package = "mysterycoffee")

so load it with:

filepath <- system.file("extdata", "names.csv", package = "mysterycoffee")
names <- read.csv(filepath)

Exercise: add data to your package

flowchart LR
    id1(Does the user need access?) --Yes--> id6(Store it in data/)
    id3(Is the data in .Rda format?)--Yes--> id1
    id1 --No, but tests do--> id5(Store it in tests/)
    id1 --No, but functions do--> id4(Store it in R/sysdata.Rda*)
    id3 --No--> id8(But can it be?)
    id8 --Yes, with some work --> id9(Document the process in data-raw/**)
    id8 --No, it shouldn't--> id7(Store it in inst/extdata)
    
Loading

*) R/sysdata.Rda is a file dedicated to (larger) data needed by your functions. Read more about it here.
**) data-raw/ is a folder dedicated to the origin and cleanup of your data. Read more about it here.

Add data to your package:

  • Do you need raw data as part of your package?
    • Create a folder inst/extdata, and save the files here. Note that a user will be able to access this data.
    • When loading the data, do not describe the path as you usually would. Instead, use something like:
      filepath <- system.file("extdata", "names.csv", package = "mysterycoffee")
      names <- read.csv(filepath)
  • Do you need data you can store in your package as an .Rda file?
    • Create the object
    • Store it with usethis::use_data(object_name)
    • Verify the object is now stored in the data/ folder
    • Create a new R file called data.R: usethis::use_r("data") (data/R is an example; you may call this whatever you want)
    • In this file, document the data object, using this example:
      #' Title
      #'
      #' A short description.
      #'
      #' @format What format is the data in?
      #' @source Where did it come from? \url{https://google.com}
      "object_name"
      
    • Don't forget to call devtools::document() to generate the documentation file and add this data to the package namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants