From ccb7c8fd2f46eb9211746ffa3785b9f3899a4865 Mon Sep 17 00:00:00 2001 From: Nick-Eagles <45461721+Nick-Eagles@users.noreply.github.com> Date: Wed, 4 Oct 2023 20:42:28 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20LieberIn?= =?UTF-8?q?stitute/slurmjobs@de3a8e0e267d3447b726ea438a68aa90f73b04c3=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- articles/slurmjobs.html | 96 ++++++++++++++------ index.html | 1 + pkgdown.yml | 2 +- reference/array_submit.html | 174 ++++++++++++++++++++++++++++++++++++ reference/index.html | 4 + sitemap.xml | 3 + 6 files changed, 251 insertions(+), 29 deletions(-) create mode 100644 reference/array_submit.html diff --git a/articles/slurmjobs.html b/articles/slurmjobs.html index af839fc..caab733 100644 --- a/articles/slurmjobs.html +++ b/articles/slurmjobs.html @@ -85,7 +85,7 @@

Leonardo University
lcolladotor@gmail.com -

3 October 2023

+

4 October 2023

Source: vignettes/slurmjobs.Rmd @@ -120,10 +120,11 @@

Install slurmjobsRequired knowledge

slurmjobs -is based on many other packages and in particular in those that have -implemented the infrastructure needed for dealing with RNA-seq data -(EDIT!). That is, packages like SummarizedExperiment -(EDIT!).

+is designed for interacting with the SLURM job scheduler, and assumes +basic familiarity with terms like “job”, “task”, and “array”, as well as +the sbatch command. +Background knowledge about memory (such as virtual memory and resident +set size (RSS)) is helpful but not critical in using this package.

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

@@ -172,19 +173,18 @@

Citing slurmjobs
-

Quick start to using slurmjobs +

Overview

+

slurmjobs provides helper functions for interacting with +SLURM-managed +high-performance-computing environments from R. It includes functions +for creating submittable jobs (including array jobs), monitoring +partitions, and extracting info about running or complete jobs. In +addition to loading slurmjobs, we’ll be using +dplyr to manipulate example data about jobs.

-

Edit this as you see fit =)

-

Here is an example of you can cite your package inside the -vignette:

-

Creating Shell Scripts to sbatch @@ -206,7 +206,7 @@

Creating Shell Scripts to sbatch job_single( name = "my_shell_script", memory = "10G", cores = 2, create_shell = FALSE ) -#> 2023-10-03 19:38:13.572525 creating the logs directory at: logs +#> 2023-10-04 20:42:11.027488 creating the logs directory at: logs #> #!/bin/bash #> #SBATCH -p shared #> #SBATCH --mem-per-cpu=10G @@ -249,7 +249,7 @@

Creating Shell Scripts to sbatch name = "my_array_job", memory = "5G", cores = 1, create_shell = FALSE, task_num = 10 ) -#> 2023-10-03 19:38:13.667186 creating the logs directory at: logs +#> 2023-10-04 20:42:11.134306 creating the logs directory at: logs #> #!/bin/bash #> #SBATCH -p shared #> #SBATCH --mem-per-cpu=5G @@ -368,6 +368,46 @@

Creating Shell Scripts to sbatch #> done

+

Submitting and Resubmitting Jobs +

+

Shell scripts created with job_single() or +job_loop() may be submitted as batch jobs with +sbatch (e.g. sbatch myscript.sh). Note no +additional arguments to sbatch are required since all +configuration is specified within the shell script.

+

The array_submit() helper function was also intended to +make job submission easier. In particular, it addresses a common case +where after a large array job was run, a handful of tasks fail (such as +due to temporary file-system issues). array_submit() helps +re-submit failed tasks.

+

Below we’ll create an example array job with +job_single(), then do a dry run of +array_submit() to demonstrate its basic usage.

+
+job_single(
+    name = "my_array_job", memory = "5G", cores = 1, create_shell = TRUE,
+    task_num = 10
+)
+#> 2023-10-04 20:42:11.943028 creating the logs directory at:  logs
+#> 2023-10-04 20:42:11.944559 creating the shell file my_array_job.sh
+#> To submit the job use: sbatch my_array_job.sh
+
+#   Suppose that tasks 3, 6, 7, and 8 failed
+array_submit("my_array_job.sh", task_ids = c(3, 6:8), submit = FALSE)
+

While task_ids can be provided explicitly as above, the +real convenience comes from the ability to run +array_submit() without specifying task_ids. As +long as the original array job was created with +job_single() or job_loop() and submitted as-is +(on the full set of tasks), array_submit() can +automatically find the failed tasks by reading the shell script +(my_array_job.sh), grabbing the original array job ID from +the log, and internally calling job_report()).

+
+#   Not run here, since we aren't on a SLURM cluster
+array_submit("my_array_job.sh", submit = FALSE)
+
+

Monitoring Running Jobs

The job_info() function provides wrappers around the @@ -381,7 +421,7 @@

Monitoring Running Jobsjob_df = job_info(user = NULL, partition = "shared") here, to get every user’s jobs running on the “shared” partition. We’ll load an example output directly here.

-
+
 #   On a real SLURM system
 job_df <- readRDS(
     system.file("extdata", "job_info_df.rds", package = "slurmjobs")
@@ -407,7 +447,7 @@ 

Monitoring Running Jobs -
+
 job_df |>
     #   Or your username here
     filter(user == "user17") |>
@@ -443,7 +483,7 @@ 

Analyzing Finished Jobsjob_report as available in the slurmjobs package.

-
+
 

Now let’s choose a better memory request:

-
+
 stat_df <- job_df |>
     #   This example includes tasks that fail. We're only interested in memory
     #   for successfully completed tasks
@@ -519,7 +559,7 @@ 

Reproducibility

This package was developed using biocthis.

Code for creating the vignette

-
+
 ## Create the vignette
 library("rmarkdown")
 system.time(render("slurmjobs.Rmd", "BiocStyle::html_document"))
@@ -528,9 +568,9 @@ 

Reproducibilitylibrary("knitr") knit("slurmjobs.Rmd", tangle = TRUE)

Date the vignette was generated.

-
#> [1] "2023-10-03 19:38:15 UTC"
+
#> [1] "2023-10-04 20:42:12 UTC"

Wallclock time spent generating the vignette.

-
#> Time difference of 3.242 secs
+
#> Time difference of 3.156 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 #>  setting  value
@@ -542,7 +582,7 @@ 

Reproducibility#> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC -#> date 2023-10-03 +#> date 2023-10-04 #> pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── @@ -556,11 +596,11 @@

Reproducibility#> cachem 1.0.8 2023-05-01 [2] RSPM (R 4.3.0) #> cli 3.6.1 2023-03-23 [2] RSPM (R 4.3.0) #> crayon 1.5.2 2022-09-29 [2] RSPM (R 4.3.0) -#> curl 5.0.2 2023-08-14 [2] RSPM (R 4.3.0) +#> curl 5.1.0 2023-10-02 [2] RSPM (R 4.3.0) #> desc 1.4.2 2022-09-08 [2] RSPM (R 4.3.0) #> digest 0.6.33 2023-07-07 [2] RSPM (R 4.3.0) #> dplyr * 1.1.3 2023-09-03 [1] RSPM (R 4.3.0) -#> evaluate 0.21 2023-05-05 [2] RSPM (R 4.3.0) +#> evaluate 0.22 2023-09-29 [2] RSPM (R 4.3.0) #> fansi 1.0.4 2023-01-22 [2] RSPM (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [2] RSPM (R 4.3.0) #> fs 1.6.3 2023-07-20 [2] RSPM (R 4.3.0) @@ -579,7 +619,7 @@

Reproducibility#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.3.0) #> pkgdown 2.0.7 2022-12-14 [2] RSPM (R 4.3.0) -#> plyr 1.8.8 2022-11-11 [1] RSPM (R 4.3.0) +#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.3.0) #> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.3.0) #> R6 2.5.1 2021-08-19 [2] RSPM (R 4.3.0) #> ragg 1.2.5 2023-01-12 [2] RSPM (R 4.3.0) @@ -590,7 +630,7 @@

Reproducibility#> rprojroot 2.0.3 2022-04-02 [2] RSPM (R 4.3.0) #> sass 0.4.7 2023-07-15 [2] RSPM (R 4.3.0) #> sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.3.0) -#> slurmjobs * 0.99.0 2023-10-03 [1] local +#> slurmjobs * 0.99.0 2023-10-04 [1] local #> stringi 1.7.12 2023-01-11 [2] RSPM (R 4.3.0) #> stringr 1.5.0 2022-12-02 [2] RSPM (R 4.3.0) #> systemfonts 1.0.4 2022-02-11 [2] RSPM (R 4.3.0) diff --git a/index.html b/index.html index 92b3150..047843e 100644 --- a/index.html +++ b/index.html @@ -78,6 +78,7 @@

slurmjobs provides helper functions for interacting with SLURM-managed high-performance-computing environments from R. It includes functions for creating submittable jobs (including array jobs), monitoring partitions, and extracting info about running or complete jobs. For details, check out the documentation site.

+

It was developed at JHPCE with SLURM 22.05.9 in mind, but is intended to generalize to other clusters and newer SLURM versions.

Installation instructions

diff --git a/pkgdown.yml b/pkgdown.yml index fb079a0..f27f54e 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -3,5 +3,5 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: slurmjobs: slurmjobs.html -last_built: 2023-10-03T19:38Z +last_built: 2023-10-04T20:42Z diff --git a/reference/array_submit.html b/reference/array_submit.html new file mode 100644 index 0000000..f1d5729 --- /dev/null +++ b/reference/array_submit.html @@ -0,0 +1,174 @@ + +Submit an array job with a specified set of task IDs — array_submit • slurmjobs + + +
+
+ + + +
+
+ + +
+

Given a bash script that specifies the --array sbatch option (that is, an +array job), this function overwrites (temporarily if restore = TRUE) the +script in place and resubmits (when submit = TRUE) the array with the +specified task_ids. If this array was created with job_single, task_ids +may be ommitted and failed tasks are automatically inferred. This function is +intended to help re-run failed tasks of a large array job that was previously +submitted.

+
+ +
+
array_submit(
+  job_bash,
+  task_ids = NULL,
+  submit = FALSE,
+  restore = TRUE,
+  verbose = FALSE
+)
+
+ +
+

Arguments

+
job_bash
+

A character(1) vector with the name of a bash script +in the current working directory.

+ + +
task_ids
+

An optional numeric vector specifying which (relative) task +IDs to resubmit (e.g. c(1, 4, 6)). If NULL, the task IDs will be inferred +by scraping the log file for the job ID for the array job as originally +submitted, and using job_report() to pull failed task IDs

+ + +
submit
+

A logical(1) vector determining whether to actually submit +the tasks or not using qsub.

+ + +
restore
+

A logical(1) vector determining whether to restore the +script to the original state.

+ + +
verbose
+

A logical(1) vector specifying whether to print details +about how failed tasks were determined (applicable when task_ids is NULL).

+ +
+
+

Value

+ + +

The path to job_bash.

+
+
+

Author

+

Leonardo Collado-Torres

+

Nicholas J. Eagles

+
+ +
+

Examples

+

+## Choose a script name
+job_name <- paste0("array_submit_example_", Sys.Date())
+
+## Create an array job on the temporary directory
+with_wd(tempdir(), {
+    ## Create an array job script to use for this example
+    job_single(
+        name = job_name,
+        create_shell = TRUE,
+        task_num = 100
+    )
+
+    ## Now we can submit the job for a set of task IDs (or omit 'task_ids'
+    ## to automatically grab those same failed task IDs)
+    array_submit(
+        job_bash = paste0(job_name, ".sh"),
+        task_ids = c(1, 6, 8:20, 67),
+        submit = FALSE
+    )
+})
+#> 2023-10-04 20:42:06.64702 creating the logs directory at:  logs
+#> 2023-10-04 20:42:06.648751 creating the shell file array_submit_example_2023-10-04.sh
+#> To submit the job use: sbatch array_submit_example_2023-10-04.sh
+#> [1] "array_submit_example_2023-10-04.sh"
+
+
+
+
+ +
+ + +
+ +
+

Site built with pkgdown 2.0.7.

+
+ +
+ + + + + + + + diff --git a/reference/index.html b/reference/index.html index c4ef36d..a871b22 100644 --- a/reference/index.html +++ b/reference/index.html @@ -54,6 +54,10 @@

All functions

+

array_submit()

+ +

Submit an array job with a specified set of task IDs

+

job_info()

Return a tibble containing information about currently running jobs.

diff --git a/sitemap.xml b/sitemap.xml index 5eef944..d7f7d0b 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -27,6 +27,9 @@ /news/index.html + + /reference/array_submit.html + /reference/index.html