Skip to content

Commit

Permalink
Fetch site files from the malariaverse Packit server.
Browse files Browse the repository at this point in the history
This adds `fetch_site` and `fetch_files` function to pull data from the
malariaverse Packit instance. It uses a cache in the user's home
directory to save files and avoid unnecessary repeated downloads.

The user needs to be authenticated in order to access the Packit server.
This is done interactively the first time a user tries to call these
functions. The user is prompted to open GitHub in their browser and type
in a short code, after which their orderly client will be able to
connect to Packit. The token that is obtained in cached to remove the
need to re-authenticate. Alternatively, users may set a GITHUB_TOKEN
environment variable with a suitable GitHub personal access token.
  • Loading branch information
plietar committed Sep 25, 2024
1 parent dc350ce commit 2c019db
Show file tree
Hide file tree
Showing 7 changed files with 514 additions and 4 deletions.
14 changes: 10 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,20 @@ License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Remotes:
mrc-ide/malariasimulation
Suggests:
testthat (>= 3.0.0)
mrc-ide/malariasimulation,
mrc-ide/orderly2
Config/testthat/edition: 3
RoxygenNote: 7.3.2
Depends:
R (>= 2.10)
Imports:
orderly2,
dplyr,
malariasimulation,
tidyr
rappdirs,
rlang,
tidyr,
withr
Suggests:
testthat (>= 3.0.0),
fs
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(fetch_files)
export(fetch_site)
export(single_site)
export(site_parameters)
export(subset_site)
163 changes: 163 additions & 0 deletions R/fetch.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
LOCATION_NAME <- "malariaverse-sitefiles"

location_configuration <- function() {
token <- Sys.getenv("GITHUB_TOKEN")
if (token == "") {
token <- NULL
}

getOption("site.orderly_location", list(
type = "packit",
args = list(
url = "https://packit.dide.ic.ac.uk/malariaverse-sitefiles",
token = token)
))
}

#' Add or update an Orderly location.
#'
#' If a location with the given name already exists and its configuration does
#' not match the given parameters, it is removed first before being added with
#' the new parameters. If it exists and has the same parameters already nothing
#' happens.
#'
#' This functionality could probably be moved to the orderly2 package.
#' @noRd
location_add_or_update <- function(name, type, args, root) {
locations <- orderly2::orderly_location_list(root = root, verbose = TRUE)
locations <- locations[locations$name == name,]

if (nrow(locations) == 0) {
orderly2::orderly_location_add(name, type, args, root = root)
} else if (locations[[1, "type"]] != type ||
!identical(locations[[1, "args"]], args)) {
orderly2::orderly_location_remove(name, root = root)
orderly2::orderly_location_add(name, type, args, root = root)
}
}


#' Configure the orderly root used to fetch sitefiles.
#'
#' This creates a folder in the user's home directory used to download and cache
#' site files. The location of the cache folder is determined by
#' [rappdirs::user_cache_dir()] and depends on the OS.
#'
#' A remote location from which the sitefiles will be fetched is configured on
#' the root. By default this is the malariaverse Packit instance hosted at
#' `https://packit.dide.ic.ac.uk/malariaverse-sitefiles`. This can be customized
#' by setting the `site.orderly_location` option.
#'
#' Users shouldn't need to call this function, as it is called implicitly by
#' [fetch_files] already.
#'
#' @return the path to the orderly root.
#' @noRd
configure_orderly <- function() {
root <- file.path(rappdirs::user_cache_dir("malariaverse-sitefiles"), "store")

orderly2::orderly_init(root, use_file_store = TRUE)

cfg <- location_configuration()
location_add_or_update(LOCATION_NAME, type = cfg$type, args = cfg$args,
root = root)

root
}


#' Get files from the malariaverse sitefile server.
#'
#' @param name The name of the orderly report.
#' @param parameters A named list of parameters to use when searching for the
#' orderly packet. If a query expression `expr` is specified, these parameters
#' are substituted into the query using the this: prefix. If no expression is
#' specified, the latest packet matching these parameters exactly is selected.
#' @param dest A directory into which the files should be copied.
#' @param files An optionally-named character vector of files to copy from the
#' packet and into the destination directory. If the vector is named, these
#' names are used as the destination file path.
#' @param expr The query expression to filter packets. This may be an arbitrary
#' orderly query, including a literal packet ID. If absent or NULL, the
#' specified list of parameters is used and matched exactly.
#' @return the id of the orderly packet the files were copied from.
#' @export
fetch_files <- function(name, parameters, dest, files, expr = NULL) {
root <- configure_orderly()

if (is.null(expr)) {
filter <- paste(sprintf("parameter:%1$s == this:%1$s", names(parameters)),
collapse = " && ")
expr <- sprintf("latest(%s)", filter)
}

options <- orderly2::orderly_search_options(
location = LOCATION_NAME,
allow_remote = TRUE,
pull_metadata = TRUE)

plan <- orderly2::orderly_copy_files(
name = name,
expr = expr,
parameters = parameters,
dest = dest,
files = files,
options = options,
root = root)

plan$id
}

#' Fetch a site file for a country from the malariaverse sitefile server.
#'
#' The site file is identified by its country code, and optionally the
#' admin_level, urban/rural setting and version of the site files. The latest
#' packet from the server matching these parameters is used.
#'
#' Alternatively, a packet ID can be specified in order to pick an exact file
#' set.
#'
#' @param iso3c the ISO country code, a scalar character.
#' @param version the dataset version, a scalar character.
#' @param admin_level a scalar number.
#' @param urban_rural a scalar logical.
#' @param id a packet ID used to select an exact packet.
#' @return The contents of the site file.
#' @examples
#' \dontrun{
#' fetch_site("NGA")
#' fetch_site("NGA", admin_level = 1)
#' fetch_site(id = "20240801-062621-6f95851a")
#' }
#' @export
fetch_site <- function(iso3c = NULL, version = NULL,
admin_level = NULL, urban_rural = NULL,
id = NULL)
{
dest <- withr::local_tempdir()
if (!xor(is.null(iso3c), is.null(id))) {
rlang::abort("Exactly one of `iso3c` and `id` must be supplied")
}

if (!is.null(iso3c)) {
parameters <- list(
iso3c = iso3c,
version = version,
admin_level = admin_level,
urban_rural = urban_rural)
parameters <- parameters[!sapply(parameters, is.null)]
expr <- NULL
} else {
parameters <- list()
expr <- id
}

fetch_files(name = "calibration_diagnostics",
files = "calibrated_scaled_site.rds",
expr = expr,
parameters = parameters,
dest = dest)


readRDS(file.path(dest, "calibrated_scaled_site.rds"))
}
60 changes: 60 additions & 0 deletions R/test-helpers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
local_orderly_root <- function(..., .local_envir = parent.frame()) {
root <- withr::local_tempdir(.local_envir = .local_envir)
suppressMessages(orderly2::orderly_init(root, ...))
root
}

#' Configure the site package to use a temporary cache directory and upstream.
#'
#' This setup is suitable for unit testing the package and is automatically torn
#' down when the caller's frame exits.
#'
#' @return the path to the orderly root used by the package as the upstream.
#' @noRd
local_test_setup <- function(.local_envir = parent.frame()) {
cache <- withr::local_tempdir(.local_envir = .local_envir)
upstream <- local_orderly_root(.local_envir = .local_envir)

withr::local_envvar(R_USER_CACHE_DIR = cache, .local_envir = .local_envir)
withr::local_options(
.local_envir = .local_envir,
"site.orderly_location" = list(type = "path",
args = list(path = upstream)))

upstream
}

#' Create an orderly packet with the given parameters and contents.
#'
#' Orderly doesn't have an easy way to craft packets from scratch, so we have
#' to resort to creating a report directory, writing the files into it,
#' generating a <name>.R file with a call to `orderly_parameters` in it and
#' running the report.
#'
#' @param name the name used for the packet
#' @param files a named list of files and their contents to include in the
#' report. If the filename ends in `.rds`, the value is written with
#' [saveRDS], otherwise [writeLines] is used.
#' @param parameters a named list of parameters to attach to the packet.
#' @param root the orderly root in which the packet will be created.
#' @return the packet id.
#' @noRd
create_orderly_packet <- function(name, files, parameters = list(), root) {
src <- fs::dir_create(root, "src", name)
withr::defer(fs::dir_delete(src))

args <- paste0(sprintf("%s = NULL", names(parameters)), collapse=",")
code <- sprintf("orderly2::orderly_parameters(%s)", args)
writeLines(code, fs::path(src, sprintf("%s.R", name)))

for (i in seq_along(files)) {
f <- fs::path(src, names(files)[[i]])
if (fs::path_ext(f) == "rds") {
saveRDS(files[[i]], f)
} else {
writeLines(files[[i]], f)
}
}
suppressMessages(orderly2::orderly_run(name, parameters, echo = FALSE,
root = root))
}
32 changes: 32 additions & 0 deletions man/fetch_files.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions man/fetch_site.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 2c019db

Please sign in to comment.