Skip to content

Commit

Permalink
Include 'job_info_df' as part of the package, and describe its creati…
Browse files Browse the repository at this point in the history
…on in a conventional way as recommended at https://r-pkgs.org/data.html#sec-data-data-raw
  • Loading branch information
Nick-Eagles committed Oct 10, 2023
1 parent 48b364f commit e7ddc68
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 17 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
^README\.Rmd$
^\.github$
^codecov\.yml$
^data-raw$
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@ Imports:
stringr,
purrr,
utils
Depends:
R (>= 2.10)
LazyData: true
Original file line number Diff line number Diff line change
@@ -1,29 +1,26 @@
library(here)
library(dplyr)
library(stringr)
source(here('R', 'job_info.R'))

# Randomly grab 100 jobs running now on the 'shared' partition
job_df = job_info(user = NULL) |>
job_info_df = job_info(user = NULL) |>
sample_n(size = 100) |>
arrange(job_id)

# A vector whose values are anonymous usernames and whose names are the
# original usernames
user_map = paste0('user', 1:length(unique(job_df$user)))
names(user_map) = unique(job_df$user)
user_map = paste0('user', 1:length(unique(job_info_df$user)))
names(user_map) = unique(job_info_df$user)

# Similarly for job names, though we'll keep the generic name for interactive
# jobs ('bash')
name_map = paste0('my_job_', 1:length(unique(job_df$name)))
names(name_map) = unique(job_df$name)
name_map = paste0('my_job_', 1:length(unique(job_info_df$name)))
names(name_map) = unique(job_info_df$name)
name_map['bash'] = 'bash'

# Anonymize username and job name
job_df = job_df |>
job_info_df = job_info_df |>
mutate(
user = user_map[user],
name = name_map[name]
)

saveRDS(job_df, here('inst', 'extdata', 'job_info_df.rds'))
usethis::use_data(job_info_df, overwrite = TRUE)
Binary file added data/job_info_df.rda
Binary file not shown.
11 changes: 4 additions & 7 deletions vignettes/slurmjobs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -178,22 +178,19 @@ array_submit("my_array_job.sh", submit = FALSE)

The `job_info()` function provides wrappers around the `squeue` and `sstat` utilities SLURM provides for monitoring specific jobs and how busy partitions are. The general idea is to provide the information output from `squeue` into a `tibble`, while retrieving memory-utilization information that ordinarily must be retrieved manually on a job-by-job basis with `sstat -j [specific job ID]`.

On a SLURM system, you'd run `job_df = job_info(user = NULL, partition = "shared")` here, to get every user's jobs running on the "shared" partition. We'll load an example output directly here.
On a SLURM system, you'd run `job_info_df = job_info(user = NULL, partition = "shared")` here, to get every user's jobs running on the "shared" partition. We'll load an example output directly here.

```{r "job_info_quick_look"}
# On a real SLURM system
job_df <- readRDS(
system.file("extdata", "job_info_df.rds", package = "slurmjobs")
)
print(job_df)
print(job_info_df)
```

The benefit to having this data in R, now, is to be able to trivially ask summarizing questions. First, "how much memory and how many CPUs am I currently using?" Knowing this answer can help ensure fair and civil use of shared computing resources, for example on a computing cluster.

```{r "job_info_total_resources"}
job_df |>
job_info_df |>
# Or your username here
filter(user == "user17") |>
filter(user == "user21") |>
# Get the number of CPUs requested and the memory requested in GB
summarize(
total_mem_req = sum(requested_mem_gb),
Expand Down

0 comments on commit e7ddc68

Please sign in to comment.