audit mentions of "memory usage" (closes #5)

simonpcouch · Jul 3, 2024 · b7da941 · b7da941
1 parent 3319cd7
commit b7da941
Show file tree

Hide file tree

Showing 8 changed files with 81 additions and 70 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,13 +1,15 @@
 Package: syrup
-Title: Measure Memory Usage for Parallel R Code
+Title: Measure Memory and CPU Usage for Parallel R Code
 Version: 0.0.0.9000
 Authors@R: c(
  person("Simon", "Couch", , "[email protected]", role = c("aut", "cre"),
  comment = c(ORCID = "0000-0001-5676-5107")),
  person(given = "Posit Software, PBC", role = c("cph", "fnd")) 
  )
-Description: Coarsely measures memory usage of R code run in parallel by 
- regularly taking snapshots of calls to the system command ps.
+Description: Measures memory and CPU usage of R code by regularly taking 
+ snapshots of calls to the system command ps. The package provides an entry 
+ point (albeit coarse) to profile usage of system resources by R code run 
+ in parallel.
 License: MIT + file LICENSE
 Suggests: 
  testthat (>= 3.0.0)

diff --git a/R/syrup.R b/R/syrup.R
@@ -1,14 +1,14 @@
-#' Memory Usage Information for Parallel R Code
+#' Memory and CPU Usage Information for Parallel R Code
 #'
 #' @description
 #' This function is a wrapper around the system command `ps` that can
-#' be used to benchmark (peak) memory usage of parallel R code.
+#' be used to benchmark (peak) memory and CPU usage of parallel R code.
 #' By taking snapshots the memory usage of R processes at a regular `interval`,
-#' the function dynamically builds up a  profile of their usage of system
+#' the function dynamically builds up a profile of their usage of system
 #' resources.
 #'
 #' @param expr An expression.
-#' @param interval The interval at which to take snapshots of memory usage.
+#' @param interval The interval at which to take snapshots of respirce usage.
 #' In practice, there's an overhead on top of each of these intervals.
 #' @param peak Whether to return rows for only the "peak" memory usage.
 #' Interpreted as the `id` with the maximum `rss` sum. Defaults to `FALSE`,
@@ -17,13 +17,15 @@
 #' @param env The environment to evaluate `expr` in.
 #'
 #' @returns A tibble with columns `id` and `time` and a number of columns from
-#' [ps::ps()] output describing memory usage. Notably, the process ID `pid`,
-#' parent process ID `ppid`, and resident set size `rss` (a measure of memory
-#' usage).
+#' [ps::ps()] output describing memory and CPU usage. Notably, the process ID
+#' `pid`, parent process ID `ppid`, percent CPU usage, and resident set size
+#' `rss` (a measure of memory usage).
 #'
 #' @details
-#' There's nothing specific about this function that necessitates the provided
-#' expression is run in parallel. Said another way, `syrup()` will work just fine
+#' While much of the verbiage in the package assumes that the supplied
+#' expression will be distributed across CPU cores, there's nothing specific
+#' about this package that necessitates the expression provided to `syrup()` is
+#' run in parallel. Said another way, `syrup()` will work just fine
 #' with "normal," sequentially-run R code (as in the examples). That said,
 #' there are many better, more fine-grained tools for the job in the case of
 #' sequential R code, such as [Rprofmem()], the
@@ -49,7 +51,7 @@
 #'
 #' res_syrup
 #'
-#' # to snapshot memory information more (or less) often, set `interval`
+#' # to snapshot memory and CPU information more (or less) often, set `interval`
 #' syrup(Sys.sleep(1), interval = .01)
 #'
 #' # use `peak = TRUE` to return only the snapshot with

diff --git a/README.Rmd b/README.Rmd
@@ -21,7 +21,9 @@ knitr::opts_chunk$set(
 [![R-CMD-check](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml)
 <!-- badges: end -->
 
-The goal of syrup is to coarsely measure memory usage of R code run in parallel by regularly taking snapshots of calls to the system command `ps`. The package name is an homage to syrupy (**SY**stem **R**esource **U**sage **P**rofile ...um, **Y**eah), a Python tool at [jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy). **This package is highly experimental and results ought to be interpreted with caution.**
+The goal of syrup is to measure memory and CPU usage of R code by regularly taking snapshots of calls to the system command `ps`. The package provides an entry point (albeit coarse) to profile usage of system resources by R code run in parallel.
+
+The package name is an homage to syrupy (**SY**stem **R**esource **U**sage **P**rofile ...um, **Y**eah), a Python tool at [jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy).
 
 ## Installation
 
@@ -96,7 +98,7 @@ res_mem
 
 These results are a bit more interesting than the sequential results from `Sys.sleep(1)`. Look closely at the `ppid`s for each `id`; after a snapshot or two, you'll see five identical `ppid`s for each `id`, and those `ppid`s match up with the remaining `pid` in the one remaining R process. This shows us that we've indeed distributed computations using forking in that that one remaining R process, the "parent," has spawned off five child processes from itself. 
 
-We can plot the result to get a better sense of how memory usage of these processes changes over time.
+We can plot the result to get a better sense of how memory usage of these processes changes over time:
 
 ```{r plot-mem, warning = FALSE}
 worker_ppid <- ps::ps_pid()
@@ -120,10 +122,10 @@ res_mem %>%
  scale_x_continuous(breaks = 1:max(res_mem$id))
 ```
 
-The percent CPU usage will always be `NA` the first time a process ID is seen, as the usage calculation is based on change since the previous recorded measure. As early as we measure, we see the workers at 100% usage, while the parent process is largely idle once it has sent data off to workers.
+The percent CPU usage will always be `NA` the first time a process ID is seen, as the usage calculation is based on change since the previous recorded value. As soon as we're able to start measuring, we see the workers at 100% usage, while the parent process is largely idle once it has sent data off to workers.
 
 ## Scope
 
-There's nothing specific about this package that necessitates the expression provided to `syrup()` is run in parallel. Said another way, syrup will work just fine with "normal," sequentially-run R code. That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as `Rprofmem()`, the [profmem](https://CRAN.R-project.org/package=profmem) package, the [bench](https://bench.r-lib.org/) package, and packages in the [R-prof](https://github.com/r-prof) GitHub organization.
+While much of the verbiage in the package assumes that the supplied expression will be distributed across CPU cores, there's nothing specific about this package that necessitates the expression provided to `syrup()` is run in parallel. Said another way, syrup will work just fine with "normal," sequentially-run R code. That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as `Rprofmem()`, the [profmem](https://CRAN.R-project.org/package=profmem) package, the [bench](https://bench.r-lib.org/) package, and packages in the [R-prof](https://github.com/r-prof) GitHub organization.
 
-Results from syrup only provide enough detail for the coarsest analyses of memory usage, but they do provide an entry to "profiling" memory usage for R code that runs in parallel.
+Results from syrup only provide enough detail for the coarsest analyses of memory and CPU usage, but they do provide an entry point to "profiling" system resource usage for R code that runs in parallel.
diff --git a/README.md b/README.md
@@ -12,13 +12,14 @@ status](https://www.r-pkg.org/badges/version/syrup)](https://CRAN.R-project.org/
 [![R-CMD-check](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml)
 <!-- badges: end -->
 
-The goal of syrup is to coarsely measure memory usage of R code run in
-parallel by regularly taking snapshots of calls to the system command
-`ps`. The package name is an homage to syrupy (**SY**stem **R**esource
+The goal of syrup is to measure memory and CPU usage of R code by
+regularly taking snapshots of calls to the system command `ps`. The
+package provides an entry point (albeit coarse) to profile usage of
+system resources by R code run in parallel.
+
+The package name is an homage to syrupy (**SY**stem **R**esource
 **U**sage **P**rofile …um, **Y**eah), a Python tool at
-[jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy). **This
-package is highly experimental and results ought to be interpreted with
-caution.**
+[jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy).
 
 ## Installation
 
@@ -44,16 +45,16 @@ syrup(Sys.sleep(1))
 #> # A tibble: 48 × 8
 #> id time pid ppid name pct_cpu rss vms
 #> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:>
-#> 1 1 2024-07-03 09:27:42 61387 60522 R NA 114MB 392GB
-#> 2 1 2024-07-03 09:27:42 60522 60300 rsession-arm64 NA 848MB 394GB
-#> 3 1 2024-07-03 09:27:42 58919 1 R NA 771MB 393GB
-#> 4 1 2024-07-03 09:27:42 97009 1 rsession-arm64 NA 240KB 394GB
-#> 5 1 2024-07-03 09:27:42 97008 1 rsession-arm64 NA 240KB 394GB
-#> 6 1 2024-07-03 09:27:42 97007 1 rsession-arm64 NA 240KB 394GB
-#> 7 1 2024-07-03 09:27:42 97006 1 rsession-arm64 NA 240KB 394GB
-#> 8 1 2024-07-03 09:27:42 97005 1 rsession-arm64 NA 240KB 394GB
-#> 9 1 2024-07-03 09:27:42 91012 1 R NA 160KB 393GB
-#> 10 1 2024-07-03 09:27:42 90999 1 R NA 160KB 393GB
+#> 1 1 2024-07-03 09:50:45 62041 60522 R NA 113MB 392GB
+#> 2 1 2024-07-03 09:50:45 60522 60300 rsession-arm64 NA 991MB 394GB
+#> 3 1 2024-07-03 09:50:45 58919 1 R NA 873MB 393GB
+#> 4 1 2024-07-03 09:50:45 97009 1 rsession-arm64 NA 240KB 394GB
+#> 5 1 2024-07-03 09:50:45 97008 1 rsession-arm64 NA 240KB 394GB
+#> 6 1 2024-07-03 09:50:45 97007 1 rsession-arm64 NA 240KB 394GB
+#> 7 1 2024-07-03 09:50:45 97006 1 rsession-arm64 NA 240KB 394GB
+#> 8 1 2024-07-03 09:50:45 97005 1 rsession-arm64 NA 240KB 394GB
+#> 9 1 2024-07-03 09:50:45 91012 1 R NA 160KB 393GB
+#> 10 1 2024-07-03 09:50:45 90999 1 R NA 160KB 393GB
 #> # ℹ 38 more rows
 ```
 
@@ -142,20 +143,20 @@ res_mem <- syrup({
 })
 
 res_mem
-#> # A tibble: 158 × 8
+#> # A tibble: 138 × 8
 #> id time pid ppid name pct_cpu rss vms
 #> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:>
-#> 1 1 2024-07-03 09:27:46 61387 60522 R NA  1GB 393GB
-#> 2 1 2024-07-03 09:27:46 60522 60300 rsession-arm64 NA  848MB 394GB
-#> 3 1 2024-07-03 09:27:46 58919 1 R NA  771MB 393GB
-#> 4 1 2024-07-03 09:27:46 97009 1 rsession-arm64 NA 240KB 394GB
-#> 5 1 2024-07-03 09:27:46 97008 1 rsession-arm64 NA 240KB 394GB
-#> 6 1 2024-07-03 09:27:46 97007 1 rsession-arm64 NA 240KB 394GB
-#> 7 1 2024-07-03 09:27:46 97006 1 rsession-arm64 NA 240KB 394GB
-#> 8 1 2024-07-03 09:27:46 97005 1 rsession-arm64 NA 240KB 394GB
-#> 9 1 2024-07-03 09:27:46 91012 1 R NA 160KB 393GB
-#> 10 1 2024-07-03 09:27:46 90999 1 R NA 160KB 393GB
-#> # ℹ 148 more rows
+#> 1 1 2024-07-03 09:50:49 62041 60522 R NA 1.03GB 393GB
+#> 2 1 2024-07-03 09:50:49 60522 60300 rsession-arm64 NA 990.52MB 394GB
+#> 3 1 2024-07-03 09:50:49 58919 1 R NA 893.17MB 393GB
+#> 4 1 2024-07-03 09:50:49 97009 1 rsession-arm64 NA 240KB 394GB
+#> 5 1 2024-07-03 09:50:49 97008 1 rsession-arm64 NA 240KB 394GB
+#> 6 1 2024-07-03 09:50:49 97007 1 rsession-arm64 NA 240KB 394GB
+#> 7 1 2024-07-03 09:50:49 97006 1 rsession-arm64 NA 240KB 394GB
+#> 8 1 2024-07-03 09:50:49 97005 1 rsession-arm64 NA 240KB 394GB
+#> 9 1 2024-07-03 09:50:49 91012 1 R NA 160KB 393GB
+#> 10 1 2024-07-03 09:50:49 90999 1 R NA 160KB 393GB
+#> # ℹ 128 more rows
 ```
 
 These results are a bit more interesting than the sequential results
@@ -167,7 +168,7 @@ forking in that that one remaining R process, the “parent,” has spawned
 off five child processes from itself.
 
 We can plot the result to get a better sense of how memory usage of
-these processes changes over time.
+these processes changes over time:
 
 ``` r
 worker_ppid <- ps::ps_pid()
@@ -207,21 +208,23 @@ res_mem %>%
 
 The percent CPU usage will always be `NA` the first time a process ID is
 seen, as the usage calculation is based on change since the previous
-recorded measure. As early as we measure, we see the workers at 100%
-usage, while the parent process is largely idle once it has sent data
-off to workers.
+recorded value. As soon as we’re able to start measuring, we see the
+workers at 100% usage, while the parent process is largely idle once it
+has sent data off to workers.
 
 ## Scope
 
-There’s nothing specific about this package that necessitates the
-expression provided to `syrup()` is run in parallel. Said another way,
-syrup will work just fine with “normal,” sequentially-run R code. That
-said, there are many better, more fine-grained tools for the job in the
-case of sequential R code, such as `Rprofmem()`, the
+While much of the verbiage in the package assumes that the supplied
+expression will be distributed across CPU cores, there’s nothing
+specific about this package that necessitates the expression provided to
+`syrup()` is run in parallel. Said another way, syrup will work just
+fine with “normal,” sequentially-run R code. That said, there are many
+better, more fine-grained tools for the job in the case of sequential R
+code, such as `Rprofmem()`, the
 [profmem](https://CRAN.R-project.org/package=profmem) package, the
 [bench](https://bench.r-lib.org/) package, and packages in the
 [R-prof](https://github.com/r-prof) GitHub organization.
 
 Results from syrup only provide enough detail for the coarsest analyses
-of memory usage, but they do provide an entry to “profiling” memory
-usage for R code that runs in parallel.
+of memory and CPU usage, but they do provide an entry point to
+“profiling” system resource usage for R code that runs in parallel.
diff --git a/man/figures/README-plot-cpu-1.png b/man/figures/README-plot-cpu-1.png
diff --git a/man/figures/README-plot-mem-1.png b/man/figures/README-plot-mem-1.png
diff --git a/man/syrup-package.Rd b/man/syrup-package.Rd
diff --git a/man/syrup.Rd b/man/syrup.Rd