Skip to content

Commit

Permalink
audit mentions of "memory usage" (closes #5)
Browse files Browse the repository at this point in the history
  • Loading branch information
simonpcouch committed Jul 3, 2024
1 parent 3319cd7 commit b7da941
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 70 deletions.
8 changes: 5 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
Package: syrup
Title: Measure Memory Usage for Parallel R Code
Title: Measure Memory and CPU Usage for Parallel R Code
Version: 0.0.0.9000
Authors@R: c(
person("Simon", "Couch", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5676-5107")),
person(given = "Posit Software, PBC", role = c("cph", "fnd"))
)
Description: Coarsely measures memory usage of R code run in parallel by
regularly taking snapshots of calls to the system command ps.
Description: Measures memory and CPU usage of R code by regularly taking
snapshots of calls to the system command ps. The package provides an entry
point (albeit coarse) to profile usage of system resources by R code run
in parallel.
License: MIT + file LICENSE
Suggests:
testthat (>= 3.0.0)
Expand Down
22 changes: 12 additions & 10 deletions R/syrup.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
#' Memory Usage Information for Parallel R Code
#' Memory and CPU Usage Information for Parallel R Code
#'
#' @description
#' This function is a wrapper around the system command `ps` that can
#' be used to benchmark (peak) memory usage of parallel R code.
#' be used to benchmark (peak) memory and CPU usage of parallel R code.
#' By taking snapshots the memory usage of R processes at a regular `interval`,
#' the function dynamically builds up a profile of their usage of system
#' the function dynamically builds up a profile of their usage of system
#' resources.
#'
#' @param expr An expression.
#' @param interval The interval at which to take snapshots of memory usage.
#' @param interval The interval at which to take snapshots of respirce usage.
#' In practice, there's an overhead on top of each of these intervals.
#' @param peak Whether to return rows for only the "peak" memory usage.
#' Interpreted as the `id` with the maximum `rss` sum. Defaults to `FALSE`,
Expand All @@ -17,13 +17,15 @@
#' @param env The environment to evaluate `expr` in.
#'
#' @returns A tibble with columns `id` and `time` and a number of columns from
#' [ps::ps()] output describing memory usage. Notably, the process ID `pid`,
#' parent process ID `ppid`, and resident set size `rss` (a measure of memory
#' usage).
#' [ps::ps()] output describing memory and CPU usage. Notably, the process ID
#' `pid`, parent process ID `ppid`, percent CPU usage, and resident set size
#' `rss` (a measure of memory usage).
#'
#' @details
#' There's nothing specific about this function that necessitates the provided
#' expression is run in parallel. Said another way, `syrup()` will work just fine
#' While much of the verbiage in the package assumes that the supplied
#' expression will be distributed across CPU cores, there's nothing specific
#' about this package that necessitates the expression provided to `syrup()` is
#' run in parallel. Said another way, `syrup()` will work just fine
#' with "normal," sequentially-run R code (as in the examples). That said,
#' there are many better, more fine-grained tools for the job in the case of
#' sequential R code, such as [Rprofmem()], the
Expand All @@ -49,7 +51,7 @@
#'
#' res_syrup
#'
#' # to snapshot memory information more (or less) often, set `interval`
#' # to snapshot memory and CPU information more (or less) often, set `interval`
#' syrup(Sys.sleep(1), interval = .01)
#'
#' # use `peak = TRUE` to return only the snapshot with
Expand Down
12 changes: 7 additions & 5 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ knitr::opts_chunk$set(
[![R-CMD-check](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of syrup is to coarsely measure memory usage of R code run in parallel by regularly taking snapshots of calls to the system command `ps`. The package name is an homage to syrupy (**SY**stem **R**esource **U**sage **P**rofile ...um, **Y**eah), a Python tool at [jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy). **This package is highly experimental and results ought to be interpreted with caution.**
The goal of syrup is to measure memory and CPU usage of R code by regularly taking snapshots of calls to the system command `ps`. The package provides an entry point (albeit coarse) to profile usage of system resources by R code run in parallel.

The package name is an homage to syrupy (**SY**stem **R**esource **U**sage **P**rofile ...um, **Y**eah), a Python tool at [jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy).

## Installation

Expand Down Expand Up @@ -96,7 +98,7 @@ res_mem

These results are a bit more interesting than the sequential results from `Sys.sleep(1)`. Look closely at the `ppid`s for each `id`; after a snapshot or two, you'll see five identical `ppid`s for each `id`, and those `ppid`s match up with the remaining `pid` in the one remaining R process. This shows us that we've indeed distributed computations using forking in that that one remaining R process, the "parent," has spawned off five child processes from itself.

We can plot the result to get a better sense of how memory usage of these processes changes over time.
We can plot the result to get a better sense of how memory usage of these processes changes over time:

```{r plot-mem, warning = FALSE}
worker_ppid <- ps::ps_pid()
Expand All @@ -120,10 +122,10 @@ res_mem %>%
scale_x_continuous(breaks = 1:max(res_mem$id))
```

The percent CPU usage will always be `NA` the first time a process ID is seen, as the usage calculation is based on change since the previous recorded measure. As early as we measure, we see the workers at 100% usage, while the parent process is largely idle once it has sent data off to workers.
The percent CPU usage will always be `NA` the first time a process ID is seen, as the usage calculation is based on change since the previous recorded value. As soon as we're able to start measuring, we see the workers at 100% usage, while the parent process is largely idle once it has sent data off to workers.

## Scope

There's nothing specific about this package that necessitates the expression provided to `syrup()` is run in parallel. Said another way, syrup will work just fine with "normal," sequentially-run R code. That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as `Rprofmem()`, the [profmem](https://CRAN.R-project.org/package=profmem) package, the [bench](https://bench.r-lib.org/) package, and packages in the [R-prof](https://github.com/r-prof) GitHub organization.
While much of the verbiage in the package assumes that the supplied expression will be distributed across CPU cores, there's nothing specific about this package that necessitates the expression provided to `syrup()` is run in parallel. Said another way, syrup will work just fine with "normal," sequentially-run R code. That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as `Rprofmem()`, the [profmem](https://CRAN.R-project.org/package=profmem) package, the [bench](https://bench.r-lib.org/) package, and packages in the [R-prof](https://github.com/r-prof) GitHub organization.

Results from syrup only provide enough detail for the coarsest analyses of memory usage, but they do provide an entry to "profiling" memory usage for R code that runs in parallel.
Results from syrup only provide enough detail for the coarsest analyses of memory and CPU usage, but they do provide an entry point to "profiling" system resource usage for R code that runs in parallel.
81 changes: 42 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ status](https://www.r-pkg.org/badges/version/syrup)](https://CRAN.R-project.org/
[![R-CMD-check](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/simonpcouch/syrup/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of syrup is to coarsely measure memory usage of R code run in
parallel by regularly taking snapshots of calls to the system command
`ps`. The package name is an homage to syrupy (**SY**stem **R**esource
The goal of syrup is to measure memory and CPU usage of R code by
regularly taking snapshots of calls to the system command `ps`. The
package provides an entry point (albeit coarse) to profile usage of
system resources by R code run in parallel.

The package name is an homage to syrupy (**SY**stem **R**esource
**U**sage **P**rofile …um, **Y**eah), a Python tool at
[jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy). **This
package is highly experimental and results ought to be interpreted with
caution.**
[jeetsukumaran/Syrupy](https://github.com/jeetsukumaran/Syrupy).

## Installation

Expand All @@ -44,16 +45,16 @@ syrup(Sys.sleep(1))
#> # A tibble: 48 × 8
#> id time pid ppid name pct_cpu rss vms
#> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:>
#> 1 1 2024-07-03 09:27:42 61387 60522 R NA 114MB 392GB
#> 2 1 2024-07-03 09:27:42 60522 60300 rsession-arm64 NA 848MB 394GB
#> 3 1 2024-07-03 09:27:42 58919 1 R NA 771MB 393GB
#> 4 1 2024-07-03 09:27:42 97009 1 rsession-arm64 NA 240KB 394GB
#> 5 1 2024-07-03 09:27:42 97008 1 rsession-arm64 NA 240KB 394GB
#> 6 1 2024-07-03 09:27:42 97007 1 rsession-arm64 NA 240KB 394GB
#> 7 1 2024-07-03 09:27:42 97006 1 rsession-arm64 NA 240KB 394GB
#> 8 1 2024-07-03 09:27:42 97005 1 rsession-arm64 NA 240KB 394GB
#> 9 1 2024-07-03 09:27:42 91012 1 R NA 160KB 393GB
#> 10 1 2024-07-03 09:27:42 90999 1 R NA 160KB 393GB
#> 1 1 2024-07-03 09:50:45 62041 60522 R NA 113MB 392GB
#> 2 1 2024-07-03 09:50:45 60522 60300 rsession-arm64 NA 991MB 394GB
#> 3 1 2024-07-03 09:50:45 58919 1 R NA 873MB 393GB
#> 4 1 2024-07-03 09:50:45 97009 1 rsession-arm64 NA 240KB 394GB
#> 5 1 2024-07-03 09:50:45 97008 1 rsession-arm64 NA 240KB 394GB
#> 6 1 2024-07-03 09:50:45 97007 1 rsession-arm64 NA 240KB 394GB
#> 7 1 2024-07-03 09:50:45 97006 1 rsession-arm64 NA 240KB 394GB
#> 8 1 2024-07-03 09:50:45 97005 1 rsession-arm64 NA 240KB 394GB
#> 9 1 2024-07-03 09:50:45 91012 1 R NA 160KB 393GB
#> 10 1 2024-07-03 09:50:45 90999 1 R NA 160KB 393GB
#> # ℹ 38 more rows
```

Expand Down Expand Up @@ -142,20 +143,20 @@ res_mem <- syrup({
})

res_mem
#> # A tibble: 158 × 8
#> # A tibble: 138 × 8
#> id time pid ppid name pct_cpu rss vms
#> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:>
#> 1 1 2024-07-03 09:27:46 61387 60522 R NA 1GB 393GB
#> 2 1 2024-07-03 09:27:46 60522 60300 rsession-arm64 NA 848MB 394GB
#> 3 1 2024-07-03 09:27:46 58919 1 R NA 771MB 393GB
#> 4 1 2024-07-03 09:27:46 97009 1 rsession-arm64 NA 240KB 394GB
#> 5 1 2024-07-03 09:27:46 97008 1 rsession-arm64 NA 240KB 394GB
#> 6 1 2024-07-03 09:27:46 97007 1 rsession-arm64 NA 240KB 394GB
#> 7 1 2024-07-03 09:27:46 97006 1 rsession-arm64 NA 240KB 394GB
#> 8 1 2024-07-03 09:27:46 97005 1 rsession-arm64 NA 240KB 394GB
#> 9 1 2024-07-03 09:27:46 91012 1 R NA 160KB 393GB
#> 10 1 2024-07-03 09:27:46 90999 1 R NA 160KB 393GB
#> # ℹ 148 more rows
#> 1 1 2024-07-03 09:50:49 62041 60522 R NA 1.03GB 393GB
#> 2 1 2024-07-03 09:50:49 60522 60300 rsession-arm64 NA 990.52MB 394GB
#> 3 1 2024-07-03 09:50:49 58919 1 R NA 893.17MB 393GB
#> 4 1 2024-07-03 09:50:49 97009 1 rsession-arm64 NA 240KB 394GB
#> 5 1 2024-07-03 09:50:49 97008 1 rsession-arm64 NA 240KB 394GB
#> 6 1 2024-07-03 09:50:49 97007 1 rsession-arm64 NA 240KB 394GB
#> 7 1 2024-07-03 09:50:49 97006 1 rsession-arm64 NA 240KB 394GB
#> 8 1 2024-07-03 09:50:49 97005 1 rsession-arm64 NA 240KB 394GB
#> 9 1 2024-07-03 09:50:49 91012 1 R NA 160KB 393GB
#> 10 1 2024-07-03 09:50:49 90999 1 R NA 160KB 393GB
#> # ℹ 128 more rows
```

These results are a bit more interesting than the sequential results
Expand All @@ -167,7 +168,7 @@ forking in that that one remaining R process, the “parent,” has spawned
off five child processes from itself.

We can plot the result to get a better sense of how memory usage of
these processes changes over time.
these processes changes over time:

``` r
worker_ppid <- ps::ps_pid()
Expand Down Expand Up @@ -207,21 +208,23 @@ res_mem %>%

The percent CPU usage will always be `NA` the first time a process ID is
seen, as the usage calculation is based on change since the previous
recorded measure. As early as we measure, we see the workers at 100%
usage, while the parent process is largely idle once it has sent data
off to workers.
recorded value. As soon as we’re able to start measuring, we see the
workers at 100% usage, while the parent process is largely idle once it
has sent data off to workers.

## Scope

There’s nothing specific about this package that necessitates the
expression provided to `syrup()` is run in parallel. Said another way,
syrup will work just fine with “normal,” sequentially-run R code. That
said, there are many better, more fine-grained tools for the job in the
case of sequential R code, such as `Rprofmem()`, the
While much of the verbiage in the package assumes that the supplied
expression will be distributed across CPU cores, there’s nothing
specific about this package that necessitates the expression provided to
`syrup()` is run in parallel. Said another way, syrup will work just
fine with “normal,” sequentially-run R code. That said, there are many
better, more fine-grained tools for the job in the case of sequential R
code, such as `Rprofmem()`, the
[profmem](https://CRAN.R-project.org/package=profmem) package, the
[bench](https://bench.r-lib.org/) package, and packages in the
[R-prof](https://github.com/r-prof) GitHub organization.

Results from syrup only provide enough detail for the coarsest analyses
of memory usage, but they do provide an entry to “profiling” memory
usage for R code that runs in parallel.
of memory and CPU usage, but they do provide an entry point to
“profiling” system resource usage for R code that runs in parallel.
Binary file modified man/figures/README-plot-cpu-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-plot-mem-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions man/syrup-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 13 additions & 11 deletions man/syrup.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit b7da941

Please sign in to comment.