Skip to content

Development journal

Jason Mercer edited this page May 31, 2019 · 7 revisions

The intent of this journal is to provide notes related to:

  1. Development strategies for isotopeconverter
  2. R package creation

These notes include some key information at the top and then the rest are in reverse chronological order, so the most recent date is first.

Just a reminder of some really useful git and GitHub commands and strategies:

Useful packages for R package development:

Other useful package development resources:

Continuous integration:

For an example of developer tags see the usethis library.

2019-05-30

Added a devel branch for messing around with the development of the package.

2019-05-28

Implemented a lot of what I've learned related to R package development to generate the first "working" (it can be attached, but does not have any actual functionality) version of the package.

Got Travis and Appveyor running. Also got codecov up, but there are no tests yet, so that badge isn't really that useful, yet. Here's where I can find more information about each service for this repo:

I think I should probably setup a devel branch for the package so I'm not over-working the continuous integration tools.

2019-05-27

Today I continued my review of R Packages, related to testing (test/), but also looked at some other package development resources related to usethis.

Testing (tests/) in R Packages

This chapter of R Packages is all about the unit test and making some automated tests to ensure code generates the expected result. Unit tests are called such because each test is meant to examine a "unit" of functionality. Tests are particularly useful for automatically ensuring that existing code is generating the intended output and that new functionalities do not impact old ones.

Test workflow

  1. devtools::use_testthat: Sets up automated checking.
    1. Creates a folder, tests/testthat for performing unit tests.
    2. Modifies the DESCRIPTION file.
    3. Creates tests/testthat.R for running tests.
  2. Modify code or test.
  3. devtools::test: Test package (Ctrl + Shift + T)
  4. Repeat until all tests pass.

Example test from the stringr package:

context("String length")
library(stringr)

test_that("str_length is number of characters", {
  expect_equal(str_length("a"), 1)
  expect_equal(str_length("ab"), 2)
  expect_equal(str_length("abc"), 3)
})

Note that the above is grouped into a hierarchy of tests and expectations, which are in-turn located in files.

Questions to ask when writing tests:

  • Does an output have the correct value?
  • Does an output have the correct class?
  • Does it produce a warning when it's supposed to?
  • Does it produce an error when it's supposed to?

It's important to write both "passing" and "failing" tests. This can be done using the testthat::expect_* functions.

Expectations

testthat::expect_* functions have two arguments:

  1. The actual result of a function.
  2. The expected result from the same function.

Useful testing functions

  • devtools::use_testthat: Sets up automated checking.
  • devtools::test: Test package (Ctrl + Shift + T).
  • testthat::test_that: Runs a test for related expectations.
    • Take advantage of the description argument. It can help one diagnose which test returned an error.
  • testthat::expect_*: A series of related functions to help with unit testing. Examples:
    • testthat::expect_equal: Tests if two things are similar, with numerical tolerance (uses all.equal).
    • testthat::expect_identical: Test for exact equivalence.
    • testthat::expect_match: Matches a character vector against a regular expression.
    • testthat::expect_output: Inspects printed output. Uses regular expressions.
    • testthat::expect_message: Output includes a message. Uses regular expressions.
    • testthat::expect_warning: Output includes a warning. Uses regular expressions.
    • testthat::expect_error: Output is an error. Uses regular expressions.
    • testthat::expect_is: Checks if a result inherits from a specified class.
    • testthat::expect_false: A nice catch-all for a condition.
    • testthat::expect_true: A nice catch-all for a condition.
    • testthat::expect_equal_to_reference: Caches a result when the expectation is not easy to predict.
  • testthat::skip: Used to skip a particular test. Placed in the actual function.

*_output, *_message, *_warning, and *_error can be particularly powerful when including a more specific output string. This allows one to test for multiple conditions.

Writing tests

Each test should focus on some small functionality associated with a function, that way it is easier to diagnose where and how things went wrong. Using testthat::test_that the first argument (the description) should finish the sentence "Test that...". For example, from the units package:

test_that("We can concatenate units if they have the same unit", {
  x <- 1:4 * as_units("m")
  y <- 5:8 * as_units("m")
  z <- c(x, y)
  
  expect_equal(length(z), length(x) + length(y))
  expect_equal(x, z[1:4])
  expect_equal(y, z[1:4 + 4])
})

Notice that the desc argument in test_that can be combined into "Test that... we can concatenate units if they have the same unit."

What to test

Tests can make it less likely that changes in code will break functionality, but tests can also make it harder to purposely modify functionality. So a balance must be struck. This may help explain why some packages I'd expect to be 100 % covered by unit tests are not -- flexibility in future development.

Strategies:

  1. Not sure what this means, but "focus on testing the external interface to your functions."
  2. Containerize functionality tests as much as possible. This will allow you to modify or delete tests more easily when altering functionalities in the future.
  3. Focus on complicated parts, as those are the most likely to break; spend less time on the simple things that you know will work.
  4. Always write a test when a bug is discovered. That is, when a bug is found, write a test first, then write the code that will pass the test.

Skipping a test

Example:

check_api <- function() {
  if (not_working()) {
    skip("API not available")
  }
}

test_that("foo api returns bar when given baz", {
  check_api()
  ...
})

Building your own testing tools

This section provides an example of "refactoring", which simplifies some of the redundancies associated with the expect_* functions in some circumstances.

Namespaces (NAMESPACE) in R Packages

Imports are used to determine how a function in one package finds a function in another package. Effectively reduces conflicts in function names between packages by better defining environments. Exports avoid conflicts with external functions.

Search path

The search path are the ordered set of package environments interrogated to find a function, data, etc. The search path can be accessed via search.

How to attach packages:

  1. library(x): Used in data analysis scripts. Do not use in a package. The value x represents the package being loaded.
  2. requireNamespace("x", quietly = TRUE): Used in packages. Returns a FALSE, which can be used to generate an error.

In a package context, Imports and Depends in the DESCRIPTION file are where packages should be loaded. Depends is used when a package is built "on top" of another. Imports loads a package, though it may not be available unless using the :: operator. Depends attaches a library -- it both loads it and makes its functions available without needing to use ::. A lot of the management association with these dependencies can be taken care of using the usethis package.

The NAMESPACE

The NAMESPACE file should probably be handled using the roxygen2 package. This, in part, is what the tags from roxygen2 help with.

Namespace directives:

  • Exports *export(): functions, including S3 and S4 generics *exportPattern(): Export all functions that follow a certain string pattern
    • For S4: *exportClasses(): S4 classes *exportMethods(): S4 methods *S3method(): S3 methods
  • Imports *import(): Import all functions from a package *importFrom(): Import select functions from a package
    • For S4: *importClassesFrom(): S4 classes *importMethodsFrom(): S4 methods *useDynLib(): Import a function from C.

Workflow

  1. Add roxygen2 tags to an .R file.
  2. devtools::document: Convert roxygen comments to a .Rd file.
  3. Examine NAMESPACE to make sure it makes sense.
  4. Repeat.

Exports

Use @exports tag from roxygen2 to export a function.

Files that start with "." are not automatically exported to NAMESPACE. It's generally better to export too little than too much, especially during the development phases of a project, because internal functions can be improved without altering external performance.

There is some additional information about exporting S3, S4, and references classes.

Imports

Different methods:

  • Use a combination of names and function: package::function(). Preferred if a function is only called a couple times.
  • @importFrom <package> <operator>. Preferred for calling specific operators from a package, like %>%.
  • @importFrom <package> <function>: Preferred for calling a specific function.
  • @import <package>: Attaches the whole package.

The Imports field in the DESCRIPTION file and the import directives in the NAMESPACE file seem confusingly alike, but that is just due to poor naming convention. Imports: ensures dependent packages are installed along with your package. Import directives actually attaches the functions from another package.

Some additional information provided related to S3 and S4 objects.

External data (data/) in R Packages

Notes from the blog usethis workflow for package development by Emil Hvitfeldt

This blog serves as something of an update to two other really useful blogs on package development, but focuses on elements of the usethis package:

Before creation

  1. Load essential packages:
    1. available
    2. devtools
    3. roxygen2
    4. usethis
    5. testthat
    6. spelling
  2. Check that the name of the package is available and not associated with some weird definition: available.

Creating a minimal functional package

Note: I may not be able to use these commands, because I've already done some of this things in a different workflow.

  1. usethis::create_package: Create the package.
  2. usethis::use_git: Setup the package to be used with git.
  3. usethis::use_github: Setup the package to be used with GitHub.

One time modifications

  1. Generate a license.
    1. Example: usethis::use_gpl3_license.
    2. Useful resources for choosing a license: https://choosealicense.com/
  2. usethis::use_readme_rmd: Generate a README. Adds the file to .Rbuildignore.
  3. Add continuous integration. Do these one at a time as each one has extra steps associated with it, highlighted in output to console.
    1. usethis::use_travis
    2. usethis::use_appveyor
    3. usethis::coverage(type = c("codecov"))
  4. usethis::use_testthat: Adds the testthat package workflow for unit testing.
  5. usethis::use_spell_check: Use spelling package to ensure spell check is done.
    1. devtools::check() is used to trigger spell check.
  6. usethis::use_data_raw: If using raw data that needs to be created/formatted.
  7. usethis::use_news_md: For bigger projects with release information.

Multiple time modifications

Shortcuts below are for Windows.

  1. Write code
  2. Restart R session: Ctrl+Shift+F10
  3. Build and reload package: Ctrl+Shift+B
  4. Test package: Ctrl+Shift+T
  5. Check package: Ctrl+Shift+E
  6. Document package: Ctrl+Shift+D
  7. usethis::use_r: Make an R function that will be added to R/. Note: Not entire clear if this is intended for individual functions or if I can add multiple functions in a single file. The latter would be more convenient.
  8. usethis::use_test: Add unit tests to the function(s) created using use_r.
  9. usethis::use_package: Import external packages that your package will depend on.
  10. Special use_package cases.
    1. usethis::use_rcpp: If using Rcpp.
    2. usethis::use_pipe: If you want to use the pipe operator, %>% without importing all of magrittr.
    3. usethis::use_tibble: If you want to work with tibbles.
  11. usethis::use_vignette

Before every git commit

  1. Restart R session: Ctrl+Shift+F10
  2. Document package: Ctrl+Shift+D
  3. Check package: Ctrl+Shift+E

Before every release

  1. Restart R session: Ctrl+Shift+F10
  2. Document package: Ctrl+Shift+D
  3. Check package: Ctrl+Shift+E
  4. usethis::use_version: Update the version

Additional notes

2019-05-26

Today I reviewed more from R Packages. Specifically, I reviewed elements of documentation (man/ folder) and making vignettes (vignettes/).

Object documentation (man/) in R Packages

This chapter largely focuses on using the roxygen2 package for developing documentation, as it automates a number of elements associated with making the documentation, and mingles the documents and functions together. For more on roxygen2:

Objects covered by this chapter:

  • Functions
  • Packages
    • Requires some special formatting compared to functions.
  • Classes, generics, methods
    • This includes S3, S4, and reference classes.

Anatomy of help file (parts in parentheses are optional):

  • Name
  • (Alias)
  • Title
    • First line of a roxygen2 skeleton
  • Description
    • Second line of a roxygen2 skeleton
  • Usage
  • Arguments (parameters)
  • Details
    • Third+ lines of a roxygen2 skeleton, before additional tags are included
  • Value
  • (Character strings)
  • (Expressions)
  • (Note)
  • (Author(S))
  • (S4 methods)
  • (References)
  • (See Also)
  • Return
  • Examples

roxygen2 comments of interest

Vocabulary:

  • Blocks: Partition a function into different parts.
  • Tags: Used to delineate blocks of information related to a function.
    • Use @@ for a literal @ sign.
  • Formatting commands: Used to format help text, make references, etc.

Example tags

  • @param <argument> <description of argument>: Indicates that the <argument> is a parameter of the function.
    • Description should include the parameter type (e.g., numeric, string, factor).
    • Can combine parameters into a single description. Ex. @param x, y Numeric vectors..
  • @inheritParams <source_function>: The new function has the same parameters as the source function, reducing the amount of typing and explanation required for similar functions.
  • @inheritParams package::function: Same as @inheritParams, but allows one to extract parameter information from another package and function.
  • @examples: Examples of how to use the function (like a miniature vignette).
    • Provides executable R code that will run when called from the utils::example() function. Ex. example("sum")
    • See \dontrun{} formats for not running some examples (because they were included to illustrate examples of failure).
  • @return <description>: The output of the function.
  • @seealso: Good for pointing users to other commands, packages, or resources of interest related to a function.
  • @family: Similar to @seealso, but used to reference a "family" of functions that are related.
  • @aliases: Alternative function names -- makes it easier to find a function.
  • @keywords: Keywords related to function.
    • Often used in the context of @keywords internal to highlight it is an "internal" function in the package, not really meant to be shared with regular users, but still of interest to the package developer and others that might be interested in extending the package (thus it still has documentation).
  • @docType: Particularly useful when documenting a package. Ex. @docType package.
  • @slot: Used to document the slot of an S4 class.
  • @rdname: Associated with S4 method documentation. Seems to be used for referencing, so information can be reiterated, without having to repeat one's self.
  • @describeIn: Also associated with S4 method documentation. Seems to be used for referencing, so information can be reiterated, without having to repeat one's self.
  • @include: Used to specify the order in which S4 classes are created. Sets the Collate field in the DESCRIPTION file.
  • @field: Used with reference classes. Replaces the functionality of the slot.

Example formatting

More on text formatting can be found out: http://r-pkgs.had.co.nz/man.html#text-formatting

  • \code{}: Similar to adding `` in markdown.
  • \link{}: Use to link to a function in current or other package.
    • Ex. \code{\link{<functioname>}}
    • Ex. \code{\link[<packagename>]{<functioname>}}
  • \dontrun{}: Used with the @examples tag. Can be expressed over multiple lines. Ex. \dontrun{sum("a")}.
  • \eqn{}: Inline equation.
  • \deqn{}: Display block equation.
  • \tabular{}: For making simple tables.
    • \tab: Separates columns.
    • \cr: Separates rows.

A function for turning a dataframe into a table that can be used in a help file:

tabular <- function(df, ...) {
  stopifnot(is.data.frame(df))

  align <- function(x) if (is.numeric(x)) "r" else "l"
  col_align <- vapply(df, align, character(1))

  cols <- lapply(df, format, ...)
  contents <- do.call("paste",
    c(cols, list(sep = " \\tab ", collapse = "\\cr\n  ")))

  paste("\\tabular{", paste(col_align, collapse = ""), "}{\n  ",
    contents, "\n}\n", sep = "")
}

cat(tabular(mtcars[1:5, 1:5]))

Documenting classes, generics, and methods

S3

Generics

Documented as regular functions.

Classes

The constructor function is what gets documented, since there are no formal definitions.

Methods

These are the "." functions associated with generics like print. It's choose your own adventure for which of these get documented, though I'd err on providing too much information, rather than too little.

S4

Generics

Defined like a function.

Classes

Need to use a combination of @slot and setClass(). Ex:

#' An S4 class to represent a bank account.
#'
#' @slot balance A length-one numeric vector
Account <- setClass("Account",
  slots = list(balance = "numeric")
)

Methods

Must be documented. Is associated with either the class document, generic document, or its own document, depending on the complexity.

Reference classes (RC)

Classes

Uses @field instead of slots. Also uses a different style in which the class methods are wrapped in a setRefClass function, where the class and methods are defined.

Vignettes (vignettes/) in R Packages

I think I can do a lot of this with the workflow I've developed using knitr/rmarkdown on other projects.

Functions of interest:

  • browseVignette: Opens up a browser (e.g., Chrome) with a list of vignettes for a named package, including the different viewing options for those vignettes (e.g., html, pdf, Rmd file, a code-only file).
    • Ex. browseVignettes("dplyr")
  • vignette: Access individual vignettes.
    • If bare (no argument included), the function returns a "View" of all the vignettes available.
    • If a vignette name is provided, then the function returns that vignette.
  • devtools::use_vignette
    • Use to create a vignette/ folder (if it doesn't already exist), as well as a .Rmd file associated with the vignette.
    • Ex. devtools::use_vignette("vignette-name") will generate the file vignettes/vignette-name.Rmd.
    • It's not yet clear to me as to how sophisticated a vignette can be. That is, can I include supporting files (e.g., images, custom css, custom javaScript)?
  • devtools::build_vignettes can be used to build just the vignettes, though devtools::build will generate more useful results.
    • Note that neither RStudio's "Build & reload" button or devtools::install_github() will build vignettes, because they are time consuming. devtools::install_github(build_vignettes = TRUE) will force vignettes to be built.

Vignette metadata

This is the YAML header (metadata). Below is an example header. Notice the output and vignette fields. They contain information not usual to "typical" rmarkdown html outputs. Note the line %\VignetteIndexEntry{Vignette Title}. Here, Vignette Title needs to be changed to the actual title (apparently it is not inherited from the YAML title).

---
title: "Vignette Title"
author: "Vignette Author"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette #This template was designed to work well with R packages
vignette: > #Special character used to tell YAML to interpret something as a literal string
  %\VignetteIndexEntry{Vignette Title} #"Vignette Title" needs to be changed to the value of the actual title.
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

Most of the rest of this chapter is related to rmarkdown and knitr, which I feel pretty comfortable with.

Other thoughts

  • I should review the udunits documentation to better understand how that library works, since it underlies the quantities ecosystem of packages.
    • I'm starting to think units is not flexible enough for my needs, but may be convinced otherwise upon a slightly deeper dive into the documentation.
    • If I understand how udunits works a bit more, I might be able to make some wrappers, or even a class, that "hides" some of my work arounds when using units, making the user experience a bit nicer and intuitive.
  • Review the roxygen2 documentation.
  • Review testthat documentation.

2019-05-25

Currently reviewing R Packages and came across the .onLoad idea in the Code chapter, which can be used to perform some "setup" functions when a package is loaded. This might be a good way to deal with adding custom isotope units to the units package. .onLoad is conventionally stored in a file called zzz.R in the R/ folder. Seems like I'm going to need to read R Packages at least twice -- once while getting started and again when trying to figure out what I missed.

Package metadata (DESCRIPTION) in R Packages

The Package metadata chapter in R Packages provides a basic outline of what is contained in a DESCRIPTION file. For an example description file: https://github.com/r-quantities/units/blob/master/DESCRIPTION.

Useful devtools functions

  • create: Creates a package. This may not be necessary if using RStudio.
  • use_package: Adds package dependencies and suggests to the description file.
  • load_all: Loads all package dependencies.
  • document: Converts Roxygen comments to a .Rd file for documentation (via roxygen2::roxygenise)
    • Seems like this would be better than completely re-building a package.
    • Downside is that links between documentation do not work unless the package is completely rebuilt.

Other issues of interest

R packages need, at a minimum, a description file. I think this may get generated when you create a package in RStudio.


Idea: For the table of isotope standards, include three different formats:

  1. R readable (as in when using the View function)
  2. Formatted to be used in an html output from kableExtra and knitr
  3. Formatted to be used in a latex output from kableExtra and knitr

2019-05-24

Reviewed a lot of documentation from the r-quantities and related projects: units, errors, quantities, and constants. Seems like I might be able build some functionality on top of these packages.