Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit pct_cpu values on macOS #4

Closed
simonpcouch opened this issue Jun 24, 2024 · 6 comments
Closed

audit pct_cpu values on macOS #4

simonpcouch opened this issue Jun 24, 2024 · 6 comments

Comments

@simonpcouch
Copy link
Owner

Related to #1. The distributions of pct_cpu values, intended to measure CPU usage, are weird on MacOS. Here's an example that distributes a set of linear regression fits across 5 cores using forking:

library(syrup)
#> Loading required package: bench
library(future)
library(tidymodels)
 
# explicitly enable forking in RStudio, otherwise future will complain
rlang::local_options(parallelly.fork.enable = TRUE)

# use 5 workers via forking
plan(multicore, workers = 5)

# wrap model resampling in `syrup()` to profile CPU usage
res_mem <- syrup({
  res <-
    fit_resamples(
      linear_reg(),
      outcome ~ .,
      vfold_cv(sim_regression(1000000))
    )
})

The distribution of pct_cpu values seems spot on (except they need to be multiplied by 100, woops) when I run it on Linux. Most of the time, most R sessions are using very little CPU, though each child of the parent process briefly hits 100% while training models:

hist(res_mem$pct_cpu)

Created on 2024-06-24 with reprex v2.1.0

Here's what the distribution looks like when I run it on macOS, though:

Trying to figure out if that 2.4% approximates something that's useful for debugging... this Macbook has 10 cores.

@simonpcouch
Copy link
Owner Author

Max was able to replicate behavior of pct_cpu maxing out around 2.4% on a 12-core machine while the "% CPU" in Activity Monitor was hitting 100%.

cc @gaborcsardi and r-lib/ps#73. syrup does a relatively straightforward transformation on the user and system CPU values outputted from ps::ps():

syrup/R/utils.R

Lines 24 to 42 in fe57dd2

# x is a data frame of row-binded ps_r_processes() outputs
mutate_pct_cpu <- function(x) {
x <- dplyr::mutate(
x,
pct_cpu = calculate_pct_cpu(time, user, system),
.after = name,
.by = pid
)
x <- dplyr::select(x, -c(user, system))
}
# time, user, and system are vectors of repeated measures from a given pid
calculate_pct_cpu <- function(time, user, system) {
intervals <- as.numeric(diff(time))
user_diffs <- diff(user)
system_diffs <- diff(system)
c(NA_real_, (user_diffs + system_diffs) / intervals)
}

...so the operation, per value, is something like:

interval <- .5
times_t <- ps::ps_cpu_times()
Sys.sleep(interval)
times_t_plus_1 <- ps::ps_cpu_times()

user <- times_t_plus_1[["user"]] - times_t[["user"]]
system <- times_t_plus_1[["system"]] - times_t[["system"]]

(user + system) / interval
#> [1] 0.001063258

This gives reasonable values on Linux, but not on macOS. If you have any intuition on why this might be, Gábor, it'd definitely be appreciated.

@gaborcsardi
Copy link

Seems like this is broken on new macs, Apple has changed the scale of the values they report and we need to adjust. Fixed in r-lib/ps@b6e0b62

{
  times_t <- ps::ps_cpu_times()
  tic <- Sys.time()
  for (i in 1:10000) runif(100000)
  toc <- Sys.time()
  times_t_plus_1 <- ps::ps_cpu_times()
  interval <- as.double(toc - tic, units = "secs")

  user <- times_t_plus_1[["user"]] - times_t[["user"]]
  system <- times_t_plus_1[["system"]] - times_t[["system"]]

  (user + system) / interval
}
#> [1] 0.9935361

@gaborcsardi
Copy link

I can do a ps release this week.

@simonpcouch
Copy link
Owner Author

Awesome, thanks for looking into this!

@gaborcsardi
Copy link

@simonpcouch New ps is on CRAN, FYI.

@simonpcouch
Copy link
Owner Author

Thanks for the heads-up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants