diff --git a/articles/slurmjobs.html b/articles/slurmjobs.html index af839fc..caab733 100644 --- a/articles/slurmjobs.html +++ b/articles/slurmjobs.html @@ -85,7 +85,7 @@
vignettes/slurmjobs.Rmd
slurmjobs.Rmd
slurmjobs
Required knowledge
slurmjobs -is based on many other packages and in particular in those that have -implemented the infrastructure needed for dealing with RNA-seq data -(EDIT!). That is, packages like SummarizedExperiment -(EDIT!).
+is designed for interacting with the SLURM job scheduler, and assumes +basic familiarity with terms like “job”, “task”, and “array”, as well as +the sbatch command. +Background knowledge about memory (such as virtual memory and resident +set size (RSS)) is helpful but not critical in using this package.If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.
@@ -172,19 +173,18 @@slurmjobs
slurmjobs
+slurmjobs
provides helper functions for interacting with
+SLURM-managed
+high-performance-computing environments from R. It includes functions
+for creating submittable jobs (including array jobs), monitoring
+partitions, and extracting info about running or complete jobs. In
+addition to loading slurmjobs
, we’ll be using
+dplyr
to manipulate example data about jobs.
Edit this as you see fit =)
-Here is an example of you can cite your package inside the -vignette:
-sbatch
@@ -206,7 +206,7 @@ sbatch
job_single(
name = "my_shell_script", memory = "10G", cores = 2, create_shell = FALSE
)
-#> 2023-10-03 19:38:13.572525 creating the logs directory at: logs
+#> 2023-10-04 20:42:11.027488 creating the logs directory at: logs
#> #!/bin/bash
#> #SBATCH -p shared
#> #SBATCH --mem-per-cpu=10G
@@ -249,7 +249,7 @@ Creating Shell Scripts to sbatch
name = "my_array_job", memory = "5G", cores = 1, create_shell = FALSE,
task_num = 10
)
-#> 2023-10-03 19:38:13.667186 creating the logs directory at: logs
+#> 2023-10-04 20:42:11.134306 creating the logs directory at: logs
#> #!/bin/bash
#> #SBATCH -p shared
#> #SBATCH --mem-per-cpu=5G
@@ -368,6 +368,46 @@ Creating Shell Scripts to sbatch
#> done
Shell scripts created with job_single()
or
+job_loop()
may be submitted as batch jobs with
+sbatch
(e.g. sbatch myscript.sh
). Note no
+additional arguments to sbatch
are required since all
+configuration is specified within the shell script.
The array_submit()
helper function was also intended to
+make job submission easier. In particular, it addresses a common case
+where after a large array job was run, a handful of tasks fail (such as
+due to temporary file-system issues). array_submit()
helps
+re-submit failed tasks.
Below we’ll create an example array job with
+job_single()
, then do a dry run of
+array_submit()
to demonstrate its basic usage.
+job_single(
+ name = "my_array_job", memory = "5G", cores = 1, create_shell = TRUE,
+ task_num = 10
+)
+#> 2023-10-04 20:42:11.943028 creating the logs directory at: logs
+#> 2023-10-04 20:42:11.944559 creating the shell file my_array_job.sh
+#> To submit the job use: sbatch my_array_job.sh
+
+# Suppose that tasks 3, 6, 7, and 8 failed
+array_submit("my_array_job.sh", task_ids = c(3, 6:8), submit = FALSE)
While task_ids
can be provided explicitly as above, the
+real convenience comes from the ability to run
+array_submit()
without specifying task_ids
. As
+long as the original array job was created with
+job_single()
or job_loop()
and submitted as-is
+(on the full set of tasks), array_submit()
can
+automatically find the failed tasks by reading the shell script
+(my_array_job.sh
), grabbing the original array job ID from
+the log, and internally calling job_report()
).
+# Not run here, since we aren't on a SLURM cluster
+array_submit("my_array_job.sh", submit = FALSE)
The job_info()
function provides wrappers around the
@@ -381,7 +421,7 @@
+# On a real SLURM system job_df <- readRDS( system.file("extdata", "job_info_df.rds", package = "slurmjobs") @@ -407,7 +447,7 @@
Monitoring Running Jobs -
+job_df |> # Or your username here filter(user == "user17") |> @@ -443,7 +483,7 @@
as available in theAnalyzing Finished Jobsjob_report
slurmjobs
package. -+job_df <- readRDS( system.file("extdata", "job_report_df.rds", package = "slurmjobs") ) @@ -463,7 +503,7 @@
Analyzing Finished Jobs#> 10 297331 user1 broken_… shared 2 5 1.16 1.16 #> # ℹ 3 more variables: array_task_id <int>, exit_code <dbl>, status <fct>
Now let’s choose a better memory request:
-+stat_df <- job_df |> # This example includes tasks that fail. We're only interested in memory # for successfully completed tasks @@ -519,7 +559,7 @@
Reproducibility
This package was developed using biocthis.
Code for creating the vignette
-+## Create the vignette library("rmarkdown") system.time(render("slurmjobs.Rmd", "BiocStyle::html_document")) @@ -528,9 +568,9 @@
Reproducibilitylibrary("knitr") knit("slurmjobs.Rmd", tangle = TRUE)
Date the vignette was generated.
-+#> [1] "2023-10-03 19:38:15 UTC"
#> [1] "2023-10-04 20:42:12 UTC"
Wallclock time spent generating the vignette.
-+#> Time difference of 3.242 secs
#> Time difference of 3.156 secs
R
session information.#> ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── #> setting value @@ -542,7 +582,7 @@
Reproducibility#> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC -#> date 2023-10-03 +#> date 2023-10-04 #> pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── @@ -556,11 +596,11 @@
Reproducibility#> cachem 1.0.8 2023-05-01 [2] RSPM (R 4.3.0) #> cli 3.6.1 2023-03-23 [2] RSPM (R 4.3.0) #> crayon 1.5.2 2022-09-29 [2] RSPM (R 4.3.0) -#> curl 5.0.2 2023-08-14 [2] RSPM (R 4.3.0) +#> curl 5.1.0 2023-10-02 [2] RSPM (R 4.3.0) #> desc 1.4.2 2022-09-08 [2] RSPM (R 4.3.0) #> digest 0.6.33 2023-07-07 [2] RSPM (R 4.3.0) #> dplyr * 1.1.3 2023-09-03 [1] RSPM (R 4.3.0) -#> evaluate 0.21 2023-05-05 [2] RSPM (R 4.3.0) +#> evaluate 0.22 2023-09-29 [2] RSPM (R 4.3.0) #> fansi 1.0.4 2023-01-22 [2] RSPM (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [2] RSPM (R 4.3.0) #> fs 1.6.3 2023-07-20 [2] RSPM (R 4.3.0) @@ -579,7 +619,7 @@
Reproducibility#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.3.0) #> pkgdown 2.0.7 2022-12-14 [2] RSPM (R 4.3.0) -#> plyr 1.8.8 2022-11-11 [1] RSPM (R 4.3.0) +#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.3.0) #> purrr 1.0.2 2023-08-10 [2] RSPM (R 4.3.0) #> R6 2.5.1 2021-08-19 [2] RSPM (R 4.3.0) #> ragg 1.2.5 2023-01-12 [2] RSPM (R 4.3.0) @@ -590,7 +630,7 @@
Reproducibility#> rprojroot 2.0.3 2022-04-02 [2] RSPM (R 4.3.0) #> sass 0.4.7 2023-07-15 [2] RSPM (R 4.3.0) #> sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.3.0) -#> slurmjobs * 0.99.0 2023-10-03 [1] local +#> slurmjobs * 0.99.0 2023-10-04 [1] local #> stringi 1.7.12 2023-01-11 [2] RSPM (R 4.3.0) #> stringr 1.5.0 2022-12-02 [2] RSPM (R 4.3.0) #> systemfonts 1.0.4 2022-02-11 [2] RSPM (R 4.3.0) diff --git a/index.html b/index.html index 92b3150..047843e 100644 --- a/index.html +++ b/index.html @@ -78,6 +78,7 @@
+
slurmjobs
provides helper functions for interacting with SLURM-managed high-performance-computing environments from R. It includes functions for creating submittable jobs (including array jobs), monitoring partitions, and extracting info about running or complete jobs. For details, check out the documentation site.It was developed at JHPCE with SLURM 22.05.9 in mind, but is intended to generalize to other clusters and newer SLURM versions.
Installation instructions
diff --git a/pkgdown.yml b/pkgdown.yml index fb079a0..f27f54e 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -3,5 +3,5 @@ pkgdown: 2.0.7 pkgdown_sha: ~ articles: slurmjobs: slurmjobs.html -last_built: 2023-10-03T19:38Z +last_built: 2023-10-04T20:42Z diff --git a/reference/array_submit.html b/reference/array_submit.html new file mode 100644 index 0000000..f1d5729 --- /dev/null +++ b/reference/array_submit.html @@ -0,0 +1,174 @@ + +Submit an array job with a specified set of task IDs — array_submit • slurmjobs + + +++ + + + + + + + diff --git a/reference/index.html b/reference/index.html index c4ef36d..a871b22 100644 --- a/reference/index.html +++ b/reference/index.html @@ -54,6 +54,10 @@+ + + + ++ + +++ +++ +Submit an array job with a specified set of task IDs
+ Source:R/array_submit.R
++array_submit.Rd
++ +Given a bash script that specifies the --array
+sbatch
option (that is, an +array job), this function overwrites (temporarily ifrestore
= TRUE) the +script in place and resubmits (whensubmit
= TRUE) the array with the +specifiedtask_ids
. If this array was created withjob_single
,task_ids
+may be ommitted and failed tasks are automatically inferred. This function is +intended to help re-run failed tasks of a large array job that was previously +submitted.++ ++array_submit( + job_bash, + task_ids = NULL, + submit = FALSE, + restore = TRUE, + verbose = FALSE +)
++Arguments
+
- job_bash
+- + + +
A
character(1)
vector with the name of a bash script +in the current working directory.- task_ids
+- + + +
An optional numeric vector specifying which (relative) task +IDs to resubmit (e.g. c(1, 4, 6)). If NULL, the task IDs will be inferred +by scraping the log file for the job ID for the array job as originally +submitted, and using
job_report()
to pull failed task IDs- submit
+- + + +
A
logical(1)
vector determining whether to actually submit +the tasks or not usingqsub
.- restore
+- + + +
A
logical(1)
vector determining whether to restore the +script to the original state.- verbose
+- + +
A
logical(1)
vector specifying whether to print details +about how failed tasks were determined (applicable whentask_ids
is NULL).++ + +Value
+ + +The path to
+job_bash
.++Examples
+++## Choose a script name +job_name <- paste0("array_submit_example_", Sys.Date()) + +## Create an array job on the temporary directory +with_wd(tempdir(), { + ## Create an array job script to use for this example + job_single( + name = job_name, + create_shell = TRUE, + task_num = 100 + ) + + ## Now we can submit the job for a set of task IDs (or omit 'task_ids' + ## to automatically grab those same failed task IDs) + array_submit( + job_bash = paste0(job_name, ".sh"), + task_ids = c(1, 6, 8:20, 67), + submit = FALSE + ) +}) +#> 2023-10-04 20:42:06.64702 creating the logs directory at: logs +#> 2023-10-04 20:42:06.648751 creating the shell file array_submit_example_2023-10-04.sh +#> To submit the job use: sbatch array_submit_example_2023-10-04.sh +#> [1] "array_submit_example_2023-10-04.sh" + +
All functions
+ + ++ Submit an array job with a specified set of task IDs
diff --git a/sitemap.xml b/sitemap.xml index 5eef944..d7f7d0b 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -27,6 +27,9 @@ Return a tibble containing information about currently running jobs.
+ /news/index.html + /reference/array_submit.html +/reference/index.html